38
10/9/01 PropBank 1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

Embed Size (px)

Citation preview

Page 1: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 1

Proposition Bank: a resource of

predicate-argument relations

Martha Palmer

University of Pennsylvania

October 9, 2001

Columbia University

Page 2: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 2

Outline Overview (Ace consensus: BBN,NYU,MITRE,Penn)

Motivation Approach

• Guidelines, lexical resources, frame sets• Tagging process, hand correction of automatic

tagging

Status: accuracy, progress Colleagues: Joseph Rosenzweig, Paul

Kingsbury, Hoa Dang, Karin Kipper, Scott Cotton, Laren Delfs, Christiane Fellbaum

Page 3: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 3

Proposition Bank:Generalizing from Sentences to Propositions

Powell met Zhu Rongji

Proposition: meet(Powell, Zhu Rongji)Powell met with Zhu Rongji

Powell and Zhu Rongji met

Powell and Zhu Rongji had a meeting

. . .

When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.

meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane))

debate

consult

joinwrestle

battle

meet(Somebody1, Somebody2)

Page 4: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 4

Penn English Treebank 1.3 million words Wall Street Journal and other sources Tagged with Part-of-Speech Syntactically Parsed Widely used in NLP community Available from Linguistic Data Consortium

Page 5: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 5

A TreeBanked Sentence

Analysts

S

NP-SBJ

VP

have VP

been VP

expecting NP

a GM-Jaguar pact

NP

that

SBAR

WHNP-1

*T*-1

S

NP-SBJVP

wouldVP

give

the US car maker

NP

NP

an eventual 30% stake

NP

the British company

NP

PP-LOC

in

(S (NP-SBJ Analysts) (VP have (VP been (VP expecting

(NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)

(S (NP-SBJ *T*-1) (VP would

(VP give (NP the U.S. car maker)

(NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British

company))))))))))))

Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

Page 6: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 6

The same sentence, PropBanked

Analysts

have been expecting

a GM-Jaguar pact

Arg0 Arg1

(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting

Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)

(S Arg0 (NP-SBJ *T*-1) (VP would

(VP give Arg2 (NP the U.S. car maker)

Arg1 (NP (NP an eventual (ADJP 30 %) stake)

(PP-LOC in (NP the British company))))))))))))that would give

*T*-1

the US car maker

an eventual 30% stake in the British company

Arg0

Arg2

Arg1

expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)

Page 7: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 7

Motivation

Why do we need accurate predicate-argument relations? They have a major impact on Information Processing. Ex: Korean/English Machine Translation: ARL/SBIR

• CoGenTex, Penn, Systran (K/E Bilinugal Lexicon, 20K)

• 4K words ( < 500 words from Systran, military messages)• Plug and play architecture based on DsyntS (rich dependency structure)• Converter bug led to random relabeling of predicate arguments• Correction of predicate argument labels alone led to tripling of

acceptable sentence output

Page 8: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 8

Focusing on Parser comparisons 200 sentences hand selected to represent “good”

translations given a correct parse. Used to compare:

• Corrected DsyntS output• Juntae’s parser output (off-the-shelf)• Anoop’s parser output (Treebank trained, 95% F)

Page 9: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 9

Evaluating translation quality Compare DLI Human translation to system output (200) Criteria used by human judges (2 or more, not blind)

• [g] = good, exactly right

• [f1] = fairly good, but small grammatical mistakes

• [f2] = Needs fixing, but vocabulary basically there

• [f3] = Needs quite a bit of fixing, usually some

un-translated vocabulary, but most v. is right

• [m] = seems grammatical, but semantically wrong,

actually misleading

• [i] = irredeemable, really wrong, major problems

Page 10: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 10

Results Comparison = 200 sent.

0 20 40 60 80 100 120

Anoop

Juntae

Correct

Bad 5 9 3

Fixable 85 67 11

Good 10 24 85

Anoop Juntae Correct

Page 11: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 11

Plug and play? Converter used to map Parser outputs into MT

DsyntS format• Bug in the converter affected both systems• Predicate argument structure labels were being lost

in the conversion process, relabeled randomly

The converter was also still tuned to Juntae’s parse output, needed to be customized to Anoop’s

Page 12: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 12

Anoop’s parse -> MTW DsyntS

–0010Target: Unit designations are normally transmitted in code.–0010Corrected: Normally unit designations are notified in the code.–0010Anoop: Normally it is notified unit designations in code.

notified

unit

normally codedesignations

C = Arg1P = Arg0

Page 13: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 13

Anoop’s parse -> MTW DsyntS

0022Target: Under what circumstances does radio inteference occur? 0022Corrected: In what circumstances does the interference happen in the radio?0022Anoop: Do in what circumstance happen interference in radio?

happen

what

radio interferencecircumstances

C = Arg0P = ArgM

C = Arg1

P = Arg0

Page 14: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 14

New and Old Results Comparison

0% 20% 40% 60% 80% 100%

A2

J2

Correct

Bad 4.5 5 4 9 3

Fixable 60.5 85 64.5 67 11

Good 37 10 31 24 85

A2 A1 J2 J1 Correct

Page 15: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 15

English PropBank

1M words of Treebank over 2 years, May’01-03 New semantic augmentations

• Predicate-argument relations for verbs• label arguments: Arg0, Arg1, Arg2, …• First subtask, 300K word financial subcorpus

(12K sentences, 35K+ predicates)

Spin-off: Guidelines (necessary for annotators)• English lexical resource• 6000+ verbs with labeled examples, rich semantics

Page 16: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 16

Task: not just undoing passives

The earthquake shook the building. <arg0> <WN3> <arg1>

The walls shook; the building rocked. <arg1> <WN3>; <arg1> <WN1>

The guidelines = lexicon with examples: Frames Files

Page 17: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 17

Guidelines: Frames Files Created manually – Paul Kingsbury

• working on semi-automatic expansion

Refer to VerbNet, WordNet and Framenet Currently in place for 230 verbs

• Can expand to 2000+ using VerbNet• Will need hand correction

Use “semantic role glosses” unique to each verb (map to Arg0, Arg1 labels appropriate to class)

Page 18: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 18

Frames Example: expectRoles: Arg0: expecter Arg1: thing expected

Example: Transitive, active:

Portfolio managers expect further declines in interest rates.

Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

Page 19: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 19

Frames File example: giveRoles: Arg0: giver Arg1: thing given Arg2: entity given to

Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

Page 20: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 20

The same sentence, PropBanked

Analysts

have been expecting

a GM-Jaguar pact

Arg0 Arg1

(S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting

Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that)

(S Arg0 (NP-SBJ *T*-1) (VP would

(VP give Arg2 (NP the U.S. car maker)

Arg1 (NP (NP an eventual (ADJP 30 %) stake)

(PP-LOC in (NP the British company))))))))))))that would give

*T*-1

the US car maker

an eventual 30% stake in the British company

Arg0

Arg2

Arg1

expect(Analysts, GM-J pact)give(GM-J pact, US car maker, 30% stake)

Page 21: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 21

Complete Sentence

Analysts have been expecting a GM-Jaguar pact that *T*-1 would give the U.S. car maker an eventual 30% stake in the British company and create joint venturesthat *T*-2 would produce an executive-model rangeof cars.

Page 22: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 22

How are arguments numbered?

Examination of example sentences Determination of required / highly preferred

elements Sequential numbering, Arg0 is typical first

argument, except ergative/unaccusative verbs (shake example) Arguments mapped for "synonymous" verbs

Page 23: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 23

Additional tags (arguments or adjuncts?)

Variety of ArgM’s (Arg#>4):• TMP - when?

• LOC - where at?

• DIR - where to?

• MNR - how?

• PRP -why?

• REC - himself, themselves, each other

• PRD -this argument refers to or modifies another

• ADV -others

Page 24: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 24

Tense/aspect Verbs also marked for tense/aspect

Passive Perfect Progressive Infinitival

Modals and negation marked as ArgMs

Page 25: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 25

Ergative/Unaccusative Verbs: rise

Roles

Arg1 = Logical subject, patient, thing rising

Arg2 = EXT, amount risen

Arg3* = start point

Arg4 = end point

Sales rose 4% to $3.28 billion from $3.16 billion.

*Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could haveused ArgM-Source, ArgM-Goal. Arbitrary distinction.

Page 26: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 26

Synonymous Verbs: add in sense riseRoles:

Arg1 = Logical subject, patient, thing rising/gaining/being added to

Arg2 = EXT, amount risen

Arg4 = end point

The Nasdaq composite index added 1.01 to 456.6 on paltry volume.

Page 27: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 27

Phrasal Verbs Put together Put in Put off Put on Put out Put up ...

Page 28: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 28

Frames: Multiple Rolesets Rolesets are not necessarily consistent between different

senses of the same verb• Verb with multiple senses can have multiple frames, but not

necessarily Roles and mappings onto argument labels are consistent

between different verbs that share similar argument structures, Similar to Framenet

• Levin / VerbNet classes• http://www.cis.upenn.edu/~dgildea/VerbNet/

Out of the 179 most frequent verbs:• 1 Roleset – 92• 2 rolesets – 45• 3+ rolesets – 42 (includes light verbs)

Page 29: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 29

Annotation procedure

Extraction of all sentences with given verb First pass – automatic tagging Second pass: Double blind hand correction

• Variety of backgrounds• less syntactic training than for treebanking

Script to discover discrepancies Third pass: Solomonization (adjudication)

Page 30: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 30

Inter-annotator agreement

0

10

20

30

40

50

60

70

80

90

100

Buy 48

Begin 70Bid 70

Base 46

See 34

End 84

Cost 67

Keep 52Sell 52Leave 50

Announce 87

Close 80

Decline 53

Call 59

Tell 18

Want 75

Comment 92

Gain 29

Name 41

Seem 83

Offer 43

Know 61

Add 51

Compare 91

Hit 57

Result 83

Believe 11

Find 61

Quote 100

Earn 90

Want 75

Bring 39

Fall 76

Work 63

Approve 81

Elect 75

Cause 55

Resign 82Result 82

Return 73

Climb 62

Change 84

Page 31: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 31

Annotator Accuracy vs. Gold Standard

Verb Darren Erwin Kate KatherineAcquire 85% 96%Add 86% 93%Announce 90% 99%Bid 50% 95%Cost 78% 89%Decline 96% 61%Hit 96% 60%Keep 92% 53%Know 89% 69%

One version of annotation chosen (sr. annotator) Solomon modifies => Gold Standard

Page 32: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 32

Status

179 verbs framed (+ Senseval2 verbs) 97 verbs first-passed

12,300+ predicates Does not include ~3000 predicates tagged for

Senseval

54 verbs second-passed 6600+ predicates

9 verbs solomonized 885 predicates

Page 33: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 33

Throughput

Framing: approximately 2 verbs per hour Annotation: approximately 50 sentences per hour Solomonization: approximately 1 hour per verb

Page 34: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 34

Automatic Predicate Argument Tagger

Predicate argument labels • Uses TreeBank “cues”

• Consults lexical semantic KB—Hierarchically organized verb subcategorization frames and

alternations associated with tree templates

—Ontology of noun-phrase referents

—Multi-word lexical items

• Matches annotated tree templates against parse in Tree-adjoining Grammar style

• standoff annotation in external file referencing treenodes

Preliminary accuracy rate of 83.7% (800+ predicates)

Page 35: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 35

Summary

Predicate-argument structure labels are arbitrary to a certain degree, but still consistent, and generic enough to be mappable to particular theoretical frameworks

Automatic tagging as a first pass makes the task feasible Agreement and accuracy figures are reassuring

Page 36: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 36

SolomonizationSource tree: Intel told analysts that the company will resume

shipments of the chips within two to three weeks . *** kate said:arg0 : Intelarg1 : the company will resume shipments of the chips within

two to three weeksarg2 : analysts*** erwin said:arg0 : Intelarg1 : that the company will resume shipments of the chips

within two to three weeksarg2 : analysts

Page 37: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 37

SolomonizationSuch loans to Argentina also remain classified as non-accruing,

*TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of interest income in the third period.

*** kate said:argM-TMP : in the third periodarg3 : the bankarg2 : $ 10 million *TRACE*-*U* of interest incomearg1 : *TRACE*-1*** erwin said:argM-TMP : in the third periodarg3 : the bankarg2 : $ 10 million *TRACE*-*U* of interest incomearg1 : *TRACE*-1 Such loans to Argentina

Page 38: 10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01 PropBank 38

SolomonizationAlso , substantially lower Dutch corporate tax rates helped the

company keep its tax outlay flat relative to earnings growth.*** kate said:argM-MNR : relative to earnings growtharg3-PRD : flatarg1 : its tax outlayarg0 : the company*** katherine said:argM-ADV : relative to earnings growtharg3-PRD : flatarg1 : its tax outlayarg0 : the company