15
Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October 30 2003

Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Embed Size (px)

Citation preview

Page 1: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Discourse Connectives and

Their Argument Structure:Annotating a discourse treebank

ARAVIND K. JOSHIDepartment of Computer and Information Science

October 30 2003

Page 2: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Outline

• Introduction• Some properties of discourse connectives• Some example annotations (preliminary) with comments

Page 3: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Introduction• Extending the notion of lexical anchors (such as verbs) and their arguments beyond sentences into discourse• Discourse connectives such as -- and, or, but, because, since, while, when, however, instead, although, also, for example, then, so that, insofar as, nonetheless, … , Empty Connectives -- they take clauses as their arguments and express relations between clauses, i.e,, relations between propositions, events, situations, … associated with the clauses• Towards computing a class of inferences associated with discourse connectives, hence relevant to complex NLP tasks– IE, MT, QA …• Towards discourse structure - discourse understanding

Page 4: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Some properties of discourse connectives

• Discourse connectives have argument structure (analogous to verbs and their argument structure) as in the Propbank. However, there are crucial differences

• arity of connectives is fixed, they are binary (some apparent exceptions)• One argument is in the same sentence in which the connective appears. The other argument may or may not be in the same sentence. It can be in the preceding or following discourse• Harder to annotate the extent of an argument

• one of the arguments can be anaphoric• Very little is known about the semantics of discourse connectives

Page 5: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Some properties of discourse connectives

• Detailed annotation of the argument structure for a large corpus is providing new insights into the semantics of connectives

• No known abstract semantic categories such as agent, patient, theme, etc. for discourse connectives -- New opportunities

• At present arguments are labeled by noncommittal labels Cc for the clause containing the connective Cc’ for the clause not containing the connective

• Example of semantics:John flunked the exam although he studied hard Cc’ although Cc

( Cc normally entails ~ Cc’ ) & Cc’

Page 6: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Research Strategy

Not shallow vs deep syntactic processing

Not shallow vs deep semantic processing

But

Deeper and deeper shallow processing

Page 7: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Subordinate: because

[The federal government suspended sales of U.S. savingsBonds] because [Congress hasn’t lifted the ceiling on government debt.]

Adverbial: however

[Both Newsweek and U.S. News have been gaining circulation in recent years without heavy use of electronicgiveaways to subscribers, such as telephone or watches.]However, [none of the big three weeklies recordedcirculation gains recently.]

• Both arguments are in the same sentence

• The two arguments in different sentences

Page 8: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Adverbial: for example

[The computers were crude by today’s standards.][Apple II owners, for example, had to use their television|sets as screens and stored data on audiocassetts.]

[The computers were crude by today’s standards.][Apple II owners, for example, had to use their televisionsets as screens and stored data on audiocassetts.]

• An argument can be a discontiguous string• Problems with aligning arguments with Penn Treebank constituents

Page 9: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Adverbial: instead

[No price for the new shares has been set.]Instead, [the companies will leave it up to the marketplaceto decide.]

• “No” is not a part of the left argument• Left argument must indicate the unselected alternative and the right argument indicates the selected alternative• Negation is the licensing context for the left argument * [Price for the new shares has been set.]Instead, [the companies will leave it up to the marketplaceto decide.]• Modalities, non-factivity are other licensing contexts

John wanted [to go to New York.] Instead, [he went to Washington.]

Page 10: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Adverbial: still

[Some senior advisors argue that with further fights overa capital-gains tax cut and a budget-reduction bill Mr.Bush already has enough pending confrontations withcongress. They prefer to put off the line-item veto untilat least next year.]Still, [Mr. Bush and some other aides are strongly drawnto the idea of trying out a line-item veto.]

• The left argument has two sentences

Page 11: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Adverbial: also

[On the Big Board, Crawford & Co., Atlanta, (CFD)begins trading today.] Crawford evaluates health careplans, manages medical and disability aspects of worker’scompensation injuries and is involved in claims adjustments for insurance companies.Also, [beginning trading today on the Big Board are ElPaso Refinery Limited Partnership, El Paso, Texas, (ELP)and Franklin Multi-Income Trust, San Mateo, Calif., (FMI).]

• • The sentence (in blue) after the left argument of “also” can be regarded as a kind of adjunct of the left argument• Discourse connectives have a fixed arity (2) and no adjuncts

Page 12: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Empty connective: EMPTY

[El Paso owns and operates a petroleum refinery.]EMPTY= whereas [Franklin is a closed-end managementinvestment company.]

• “whereas” is the connective that one annotator thought best described the relation expressed by the empty connective• Analogous to the empty relation in a noun-noun compound at the sentence level

Page 13: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

How many discourse connectives in PTB?

Types: about 253

(Subordinating: 32, Coordinating: 4, Adverbial/Anaphoric: 217)

Tokens: about 23,620

(Subordinating: 7011, Coordinating: 6169, Adverbial/Anaphoric: 10,440)

Empty connectives: Tokens: about 20,000 Types: ??Total: Tokens: 43,620

Page 14: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

How PDTB differs existing discourse annotations, such as the RST-annotated corpus (Carlson, Marcu, and Okurowski, 2003, to appear) ?

• PDTB marks the discourse relations associated with lexical connectives (explicit and implicit), including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure

• The existing RST-annotated corpus contains no record of the basis on which a rhetorical relation is assigned

• RST is an attempt to provide a very high level annotation leading to low inter-annotator agreement

• RST corpus in only 1/5 of PTB

• Relating the two annotations at a later stage will be useful

Page 15: Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Project:

• Annotate discourse connectives and their argument structure for the Penn Treebank corpus• Discourse Lexicalized TAG parser (DLTAG)

People: Eleni Miltsakaki, Rashmi Prasad, Annotators Aravind JoshiCollaborator: Bonnie Webber (Edinburgh University)Consultants: Mitch Marcus, Martha Palmer, Ellen Prince, Fernando Pereira