Upload
piper
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Human summary production operations for computer-aided summarisation. Laura Hasler University of Wolverhampton 30 May 2007. Overview. Original contributions of my thesis Human summarisation (HS) Automatic summarisation (AS) Computer-aided summarisation (CAS) - PowerPoint PPT Presentation
Citation preview
Human summary production operations for computer-
aided summarisation
Laura HaslerUniversity of Wolverhampton
30 May 2007
2
Overview
• Original contributions of my thesis
• Human summarisation (HS)
• Automatic summarisation (AS)
• Computer-aided summarisation (CAS)
• Classification of human summary production operations
• Guidelines derived from the classification
• Evaluation of guidelines and classification
3
Original contributions
• Reliable ways of creating abstracts from extracts, improving coherence/readability
• Set of guidelines to annotate source texts for important information resulting in extracts for corpus of extract/abstract pairs
• Corpus of extract/abstract pairs for analysis• Corpus-based classification of human
summary production operations that successfully transform extracts into abstracts by improving coherence and readability
4
Original contributions 2
• Set of summary production guidelines derived from classification which can be issued to users of a CAS system
• Development of Centering Theory (Grosz, Joshi & Weinstein 1995) as evaluation metric due to unsuitable existing methods
• Evaluation of coherence and readability of abstracts produced using summary production operations therefore of guidelines and operations themselves
5
Human summarisation: 3 stages(Endres-Niggemeyer 1998)• Document exploration: summariser
explores layout and organisation of document to identify position of important information
• Relevance assessment: summariser assesses information in document to see if it is relevant to summary by recognising the theme (what it is ‘about’)
• Summary production: summariser cuts and pastes relevant information from document and edits it to form a coherent summary
6
Automatic summarisation
Extracting• Units extracted from source verbatim
problems with coherence, unnecessary info• Methods can be easily used across domains• Currently more popular; CAST
Abstracting • Additional knowledge can be used concepts• Not restricted to linguistic realisation of source
more coherent and concise• Needs knowledge base domain dependent
7
Computer-aided summarisation• A feasible alternative to fully automatic
summarisation given current technology – problems of coherence and readability with automatic extracts
• Uses automatic summarisation methods to produce an extract (stages 1&2) then post-edited by human summariser/user (stage 3)
• Focus of this research on post-editing (extract abstract) to improve coherence/readability
8
Aim of the research
A) Chernobyl reactor number 4 was ripped apart by an explosion on 26 April 1986. Last September, the IAEA and the WHO released a report. Its headline conclusion that radiation from the accident would kill a total of 4000 people was widely reported.
B) Last September, the IAEA/WHO released a report on the explosion of Chernobyl reactor number 4 on 26 April 1986, concluding that radiation from the accident would kill a total of 4000 people. (h03-ljh)
9
How can we consistently transform extracts into abstracts?
• Guidelines: available for other aspects/types of summarisation
• Investigation of what exactly a human summariser does to get from an extract to an abstract (and improve coherence)
• Corpus to allow analysis and classification
• Set of guidelines derived from classification
• Application and evaluation of classification/ guidelines to prove they work
10
Corpus of extract/abstract pairs
• 43 pairs of news texts (extract, abstract)
• Source texts manually annotated for important information - higher quality
• Annotated using adapted CAST guidelines (Hasler et al. 2003): 30% extracts produced
• Extracts transformed into 20% abstracts - no guidelines given
11
Classification of operations
• 5 general classes of operations
• Atomic and complex
• Atomic: deletion, insertion
• Complex: replacement, reordering, merging
• Each split into sub-operations (26 in total)
• Sub-operations linked to triggers, or recognisable surface forms
• Function of units also important
12
Classification
Atomic operations and sub-operations
• Deletion: complete sentences, subordinate clauses, PPs, adverb phrases, reporting clauses, NPs, determiners, the verb be, specially formatted text, punctuation
• Insertion: connectives, formulaic units, modifiers, punctuation
13
Classification 2
Complex operations and sub-operations
• Replacement: pronominalisation, lexical substitution, NP restructuring, nominalisation, referred sentences, VPs, passivisation, abbreviations
• Reordering: emphasising, coherence
• Merging: clause/sentence restructuring, punctuation/connectives
14
Deletion
• “The process of removing a unit from a certain place in the extract so it does not appear in the same place in the abstract”
• Used alone or as part of complex operations
• Very useful for reducing text when used alone
• Deletes non-essential units e.g. details, repetitions
• Complete sentences, subordinate clauses, PPs, reporting clauses, determiners, be
15
Deletion examples
• [I suspect that] the set would be the ideal book for a physicist to be cast away with on a desert island. (new-sci-B7L-54-ljh)
• Three papers published recently in Science move us a little closer to understanding the basis of the disease[, which turns out to be highly complex]. (sci04done-an)
• Britain [is] among [the] front runners as tomorrow’s supercomputers take shape. (sci05done-an)
16
Insertion
• “The process of adding a unit which is not present in the extract into the abstract”
• Used alone or as part of complex operations
• Interesting because it adds text to something which is supposed to be reduced
• Used to add coherence and to clarify whilst saving space
• Connectives, modifiers, ‘formulaic units’, punctuation
17
Insertion examples
• He sees the need to raise public awareness and demystify science and technology as a key point… (new-sci-B7L-75-ljh) [X sees Y as Z]
• The TV series Men of Science is now being shown in a few other areas. (new-sci-B7L-69-ljh)
18
Replacement
• “The deletion of one unit and the insertion of a different one in the same place in the text”
• Complex operation, can be used in combination with other complex operations
• Useful for avoiding repetition and saving space
• Pronominalisation, lexical substitution, NP restructuring, nominalisation, VPs, passivisation, abbreviations
19
Replacement examples
• [Zhanat Carr, a radiation scientist with the WHO in Geneva,] The WHO [says] admits the 5000 deaths were omitted because the report was a "political communication tool". (h03-ljh)
• [All this] [is] hardly Culver’s fault. [The same difficulties are to be found in all other parts of evolutionary ecology.] These general difficulties of evolutionary ecology are hardly Culver’s fault. (new-sci-B7L-63-ljh)
20
Reordering
• “The deletion of a unit from one place in the extract and its insertion in a different place in the abstract”
• Complex operation, can be used in combination with other complex operations
• Sub-functions rather than operations – difficult to sub-classify
• Emphasises information, improves coherence and readability
21
Reordering example
• Text about world’s second face transplant, all other sentences about a specific person/ operation: S2 last sentence
• Experts predict the number of these operations will rise rapidly as centres around the world gear up to perform the procedure. (h01-ljh)
22
Merging
• “Taking information from different units in the extract and presenting them as one unit in the abstract”
• All other operations can be used
• Large class, most difficult to sub-classify – anything (appropriate) goes!
• Best embodies abstracting as opposed to extracting – conciseness
• Restructuring of clauses/sentences, punctuation/ connectives
23
Merging example
• In October 1980 Zuccarelli filed [an expensive] European patent application, covering nine countries including Britain [. … The cost of pushing a European patent through in nine countries is around $10000. The cost of application alone is around $2000 and Zuccarelli has already paid an extra $500
for a further stage of official examination]. (new-sci-B7K-37)
24
Evaluation
• Applied guidelines to a different set of extracts
• 25 human-produced extracts + corresponding abstracts
• 25 automatically produced extracts + corresponding abstracts
• Developed Centering Theory as an evaluation method due to unsuitability of existing methods
25
Centering Theory (CT) (Grosz, Joshi & Weinstein 1995)
• Theory of local coherence and salience
• Accounts for coherence using repetitions of entities across consecutive utterances (Cfs, Cps, Cbs)
• Uses the relationship between repetitions to derive ‘transitions’ (position in utterance)
• Transitions are ordered in preference from most to least coherent (continue, retain, smooth shift, rough shift, no transition/no Cb)
26
Centering Theory: an exampleJohn[Cp] went to his favorite music store to buy a piano.He[Cp], [Cb] had frequented the store for many years.He[Cp], [Cb] was excited that he could finally buy a piano.He[Cp], [Cb] arrived just as the store was closing for the day.Continue, continue, continue
John[Cp] went to his favorite music store to buy a piano.It[Cp] was a store John[Cb] had frequented for many years.He[Cp], [Cb] was excited that he could finally buy a piano.It[Cp] was closing just as John[Cb] arrived.Retain, continue, retain
(Grosz, Joshi & Weinstein 1995: 206)
27
Centering Theory: a real example1. (Everybody)[Cp] should be ready for ((Monday)'s national
championship game), despite (casualties in ((Saturday night)'s NCAA semifinal battles)). no transition (indirect)
2. (Jason Terry of (Arizona))[Cp], [Cb] was injured. retain3. “(We)[Cp] were going to put (him)[Cb] in late in (the game),”
said (Arizona coach (Lute Olson)). rough shift4. “(He)[Cp] had played a lot before (that), of course, but when
(we)'re protecting (a lead), (we)[Cb] like getting (four perimeter guys) in there and (that) gives (us) (another ball handler), gives (us) (another free throw shooter).” retain
5. (Kentucky coach (Rick Pitino))[Cp] predicted that ((Monday)'s championship game) would be also be physical, in view of (((Kentucky)'s all-out pressure defence) and ((Arizona)[Cb]'s blazing speed)).
28
CT evaluation metric
Transition Weight
Continue +3
Retain +2
No transition (indirect) +1
Smooth shift -1
Rough shift -2
No transition (no Cb) -5
29
Evaluation 2
• Human judgment obtained to complement CT
• Overall, human summary production operations improve texts: CT = 78%; Judge = 82%
• Agreement between CT and judge = 70%
• Classification and resulting guidelines can be reliably used during post-editing in CAS
• CT is useful as an evaluation method
30
Directions for future work
• To use more human summarisers/judges to further validate classification/guidelines
• To further explore/improve CT for evaluation
• To investigate the feasibility of automating certain elements of summary production operations for CAS
• To look at scientific texts (also popular in AS)