33
1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

  • View
    242

  • Download
    5

Embed Size (px)

Citation preview

Page 1: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

1

Annotation Guidelines for the Penn Discourse Treebank

Part A

Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

Page 2: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

2

Discourse relations (1)

Discourse relations hold between parts of text One way of marking discourse relations is by

use of explicit markers

Markers discourse connectivesTextual spans they relate arguments

Page 3: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

3

Example

(1) On the one hand, John loves Barolo.(2) So he went and ordered three cases.(3) On the other hand, he didn’t have much

money.(4) So then he had to cancel the order.

Page 4: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

4

Discourse relations (2)

Between adjacent textual spans, discourse relations may hold which must be inferred.

In such cases, we establish the presence of an implicit connectiveExample(5a) You should never lend any books to John.(5b) He never returns them.

Page 5: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

5

Goals of PDTB

To produce a large scale and reliably annotated corpus, whichEncodes discourse relations associated with

discourse connectivesIncluding implicit connectives

Page 6: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

6

Corpus

Penn TreebankApprox. 1 million wordsWall Street Journal25 sections100 files in each section

Page 7: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

7

Annotation tasks

Annotation of explicit connectives

Annotation of implicit connectives

Page 8: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

8

Annotation tool

WordfreakAllows you to search for specific connectivesKeeps record of connectives and arguments

More later…

Page 9: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

9

Explicit connectives (1)

Subordinate conjunctions ‘because’, ‘although’, ‘when’, etc.Arguments found locallySubordinate clauses can be preposed

(6a) John failed the exam because he was lazy.(6b) Because he was lazy, John failed the exam.

Page 10: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

10

Explicit connectives (2)

Coordinate conjunctions‘and’, ‘but’, ‘or’, ‘so’Arguments found locallyPreposing is not allowed

(7a) John is very smart but he failed the exam.(7b) # But he failed the exam, John is very

smart.

Page 11: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

11

Explicit connectives (3)

Adverbials‘therefore’, ‘however’, ‘as a result’, etc.One argument found locallyOne argument may or may not be found locally

(1) On the one hand, John loves Barolo. (2) So he went and ordered three cases. (3) On the other hand, he didn’t have much money. (4) So then he had to cancel the order.

Page 12: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

12

Annotation of explicit conns

We have grouped explicit connectives in sets of 10.

Your task is to:• Identify all instances of a given set of connectives in

the corpus.

• Mark their arguments.

Proceed one file at a time.

Page 13: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

13

Sets of explicit connectives

In progress: Set 3 Adverbials

• Indeed, for example Coordinate conj.

• And, but, or Subordinate conj.

• As soon as, unless, as long as, after, until

In progress: Set 4 Adverbials

• Though, yet, so, on the contrary, conversely

Coordinate conj.• nor

Subordinate conj.• Whereas, as, insofar as, till

Already annotated: Adverbials:

• instead, otherwise, therefore, as a result, nevertheless, in fact, then, on the other hand, however, furthermore/further

Subordinate conjunctions• Because, although, even though,

when, so that, if, while, since

Empty• Section 00, Section 06

Page 14: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

14

Implicit connectives

Implicit connectives describe relations that hold between adjacent textual spans and that they must be inferred.

In PDTB we will only annotate implicit connectives between sentences in the same paragraph.

We will initially ignore implicit connectives across paragraphs or within a sentence.

(8) John walked across the room, waving at everybody.

Page 15: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

15

Annotation of implicit conns

Relation between two adjacent sentences. Both sentences belong to the same paragraph. Second sentence does not contain a connective. ! Preposed subordinate conjunctions in the second sentence do not count

(9) Mary stayed until late. (IMPLICIT) Although she was very tired, she had to finish the report today.

Mark the period as a placeholder for an implicit connective. Mark the arguments. Provide an explicit connective that best expresses the relation.

Page 16: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

16

What is a legal argument

Multiple-sentences Sentences

Main clause + subordinate clauses (10) It is cold, although the sun is shining. (11) John walked across the hall, waving his hand

cheerfully.

ClausesGrammatical unit that contains a predicate and its

argumentsTensedNon-tensed

Page 17: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

17

Predicates and propositions

PredicatesVerbSays something about the subject

(12) John is sleepingMay require one, two, or three arguments (‘sleep’, ‘eat’,

‘give’ Propositions (expressions of events or states)

Predicate and its argumentsSemantic objects, constant across syntactic variability

(13) John ate the banana.(14) The banana was eaten by John.(15) Did John eat the banana?

Page 18: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

18

Attention! Discourse relations hold between propositions. When annotating arguments include a predicate.

(16) Everybody considered Einstein's contribution to be a breakthrough because he discovered the theory of relativity.

Do not separate a predicate from its arguments. BUT: Implicit arguments are OK in non-tensed clauses.

(17) John crossed the hall, waving his hand cheerfully.

ALSO: If the only thing left from the clause that contains your selection is a non-verbal element, include it in your selection.

(18) * In Geneva, however, [they supported Iran’s proposal].

Page 19: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

19

What is not a legal argument

Textual spans that do not contain (or refer to) propositional material (usually a verb and its arguments at minimum).

Verbs separated from their arguments.You can select a clause that is the argument of a verb,

excluding the verb.You cannot select the verb and leave out its arguments.

(19) John said [that Mary left]. OK (20) [John said] that Mary left. NOT OK (21) [John said that Mary left]. OK

Page 20: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

20

NPs as arguments?!

Discourse deictic expressions are NPs and they may be selected as arguments because they may refer to propositional material.

Discourse deictic expressions are ‘this’ and ‘that’ when they refer to textual spans in the preceding discourse.

(21) ABC is firing 1,000 employees. That (is) because they have huge debts.

Page 21: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

21

What is a legal argument: summary

A single clause [John left]. Because [John left]… While [watching TV]…. John wants [to leave].

A single sentence [John wants to leave because he’s sick].

Multiple sentences NPs that refer to clauses

[This] because… Some nominal forms expressing events or states (but make a

note) After [the sudden price increase]…

Page 22: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

22

What is ARG1/ARG2

The clause that contains the connective is always Arg2.

The other argument of the connective is Arg1. Note that with subordinate conjunctions it is

possible for Arg2 to precede Arg1.

(22)Because [Arg 2 he was sick], [Arg1 John left early].

Page 23: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

23

ARG and SUP annotations

When deciding what to mark as an argument of the connective, you should select what is ‘minimally’ necessary to interpret the relation established by the connective. Mark that as ARG.

This is a good principle to follow.

However, sometimes you may feel you want to mark/include material which provides useful, even if not crucial, information about the interpretation of an argument. Mark this as SUP. SUP annotations are optional.

Page 24: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

24

SUP: Example 1

Lawyers and their clients who frequently bring business to a country courthouse can expect to appear before the same judge year after year. [Fear of alienating that judge is pervasive], says Maurice Geiger, founder and director of the Rural Justice Center in Montpellier, Vt., a public interest group that researches rural justice issues.

As a result, lawyers think twice before appealing a judge’s ruling, are reluctant to mount, or even support, challenges against him for reelection and usually loath to file complaints that might impugn a judge’s integrity.

Page 25: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

25

SUP: Example 2

While dividends have risen smartly, [their expansion hasn’t kept pace with even stronger advances in stock prices].

Page 26: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

26

Connectives are not part of their arguments

When annotating the second argument of a connective do not include the connective itself.

(23) He failed the exam although he had studied hard.

Connectives may appear in an argument of a connective that you are annotating. Include that connective in the selection of the argument.

(24) When the stock market dropped nearly 7% Oct. 13, for instance, the Mexico Fund plunged about 18% and the Spain Fund fell 16%.

Page 27: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

27

What about sentence medial connectives?

If a connective is sentence medial you exclude it from your selection of the argument.

Wordfreak allows you to select discontinuous text and enter it as single argument.

Page 28: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

28

Using the discontinuous text selection feature in the tool

On-line demo Basic steps

Press Control Select span 1Holding Control pressed,Select span 2, 3, etc.Then click on the Arg button to enter your selection All selected spans will show up in the Arg window in the

order that they were selected

Page 29: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

29

Examples with discontinuous text selections

Connectives (25) In Geneva, however, they supported Iran’s proposal.

Modifications (26) Mary, who is a friend of mine, just arrived in Philadelphia.

Parentheticals (27) The price of the stock -many had expected this- was rising.

Page 30: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

30

What not to annotate! Do not annotate connectives that are followed by a preposition.

Out:(28) Instead of teaming up, GE Capital staffers and Kidder

investment bankers have bickered.

In:(29) The Hopkinsian universal disinterested benevolence,

although holding to original sin and the doctrine of election, inspired its adherents to heroic endeavours for others, ...

(30) Its 1,400-member brokerage operation reported an estimated $5 million loss last year, although Kidder expects it to turn a profit this year.

Page 31: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

31

Connectives and relations

Think of a discourse relation that ‘then’ can express?

Think of another discourse relation that ‘then’ can express?

Think of a connective that expresses a ‘contrastive’ relation?

Think of another connective that expresses a ‘contrastive’ relation?

Other discourse relations?

Page 32: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

32

Practice: Test your understanding of legal arguments

1. When Sophie and Joanna got to the supermarket they went their separate ways.

2. At the end of the road there was a sharp bend, known as Captain’s Bend.

3. People seldom went that way except on the weekend.4. Sophie tried to imagine herself shaking hands and

introducing herself as Lillemor Amundsen, but it seemed all wrong. It was someone else who kept introducing himself.

5. ‘I’m Sophie Amundsen,’ she said. 6. Sophie tried to beat her reflection to it with a lightning

movement but the girl was just as fast. 7. Sophie pressed her index finger to the nose in the mirror

and said, ‘You are me.’ As she got no answer to this, she turned the sentence around and said, ‘I’m you.’

Page 33: 1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber

33

Wordfreak

On-line demonstration