63
807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Embed Size (px)

Citation preview

Page 1: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

807 - TEXT ANALYTICS

Massimo Poesio

Lecture 10: Summarization

Page 2: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

What is summarization?

To take an information source, extract content from it, and present the most important content to the user in a condensed form and in a manner sensitive to the user’s application needs

Page 3: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

3

Flu stopperA new compound is set for human testing (Times)

Running nose. Raging fever. Aching joints. Splitting headache. Are there any poor souls suffering from the flu this winter who haven’t longed for a pill to make it all go away? Relief may be in sight. Researchers at Gilead Sciences, a pharmaceutical company in Foster City, California, reported last week in the Journal of the American Chemical Society that they have discovered a compound that can stop the influenza virus from spreading in animals. Tests on humans are set for later this year.

The new compound takes a novel approach to the familiar flu virus. It targets an enzyme,called neuraminidase, that the virus needs in order to scatter copies of itself throughout thebody. This enzyme acts like a pair of molecular scissors that slices through the protectivemucous linings of the nose and throat. After the virus infects the cells of the respiratorysystem and begins replicating, neuraminidase cuts the newly formed copies free to invadeother cells. By blocking this enzyme, the new compound, dubbed GS 4104, prevents theinfection from spreading.

Single-document summarization

Page 4: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

4

Flu stopperA new compound is set for human testing (Times)

Running nose. Raging fever. Aching joints. Splitting headache. Are there any poor souls suffering from the flu this winter who haven’t longed for a pill to make it all go away? Relief may be in sight. Researchers at Gilead Sciences, a pharmaceutical company in Foster City, California, reported last week in the Journal of the American Chemical Society that they have discovered a compound that can stop the influenza virus from spreading in animals. Tests on humans are set for later this year.

The new compound takes a novel approach to the familiar flu virus. It targets an enzyme,called neuraminidase, that the virus needs in order to scatter copies of itself throughout thebody. This enzyme acts like a pair of molecular scissors that slices through the protectivemucous linings of the nose and throat. After the virus infects the cells of the respiratorysystem and begins replicating, neuraminidase cuts the newly formed copies free to invadeother cells. By blocking this enzyme, the new compound, dubbed GS 4104, prevents theinfection from spreading.

Single-document summarization

Page 5: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

5

Application: Headline news

Page 6: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

6

Application: TV-GUIDES

Page 7: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

7

Application: Abstracts of papers

Page 8: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Multi-document summarization

MULTI-DOCUMENT summarization (doing this from a large number of news items) a particularly popular application

Page 9: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization
Page 10: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization
Page 11: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Human summarization and abstracting

• What professional abstractors do• Ashworth:

• “To take an original article, understand it and pack it neatly into a nutshell without loss of substance or clarity presents a challenge which many have felt worth taking up for the joys of achievement alone. These are the characteristics of an art form”.

Page 12: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Cremmins 82, 96

• Original version:

There were significant positive associations between the concentrations of the substance administered and mortality in rats and mice of both sexes.

There was no convincing evidence to indicate that endrin ingestion induced and of the different types of tumors which were found in the treated animals.

• Edited version:

Mortality in rats and mice of both sexes was dose related.

No treatment-related tumors were found in any of the animals.

Page 13: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

13

Computational Approach: Basics

Top-Down: • I know what I want! —

don’t confuse me with drivel!

• User needs: only certain types of info

• System needs: particular criteria of interest, used to focus search

Bottom-Up: • I’m dead curious: what’s

in the text?

• User needs: anything that’s important

• System needs: generic importance metrics, used to rate content

Page 14: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

14

Query-Driven vs. Text-DRIVEN Focus

• Top-down: Query-driven focus– Criteria of interest encoded as search specs.– System uses specs to filter or analyze text portions.– Examples: templates with slots with semantic

characteristics; termlists of important terms.• Bottom-up: Text-driven focus

– Generic importance metrics encoded as strategies. – System applies strategies over rep of whole text. – Examples: degree of connectedness in semantic graphs;

frequency of occurrence of tokens.

Page 15: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Types of summaries

• Extracts– Sentences from the original document are

displayed together to form a summary

• Abstracts– Materials is transformed: paraphrased,

restructured, shortened

Page 16: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Ideal stages of summarization

• Analysis– Input representation and understanding

• Transformation– Selecting important content

• Realization– Generating novel text corresponding to the gist of the input

Page 17: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

What current systems do

• Most work bottom-up

• Typically use shallow analysis methods– Rather than full understanding

• Work by sentence extraction– Identify important sentences and piece them together to

form a summary

• More advanced work: move towards more abstractive summarization

Page 18: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Shallow approaches

• Relying on features of the input documents that can be easily computes from statistical analysis

• Word statistics• Cue phrases • Section headers• Sentence position

Page 19: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

What is the input?

• News, or clusters of news– a single article or several articles on a related topic

• Email and email thread• Scientific articles• Health information: patients and doctors• Meeting summarization• Video

Page 20: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

What is the output

• Keywords• Highlight information in the input• Chunks or speech directly from the input or

paraphrase and aggregate the input in novel ways

• Modality: text, speech, video, graphics

Page 21: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Supervised methods

• Ask people to select sentences• Use these as training examples for machine

learning– Each sentence is represented as a number of

features– Based on the features distinguish sentences that

are appropriate for a summary and sentences that are not

• Run on new inputs

Page 22: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Edmundson 69

• Cue method:– stigma words (“hardly”, “impossible”)– bonus words (“significant”)

• Key method:– similar to Luhn

• Title method:– title + headings

• Location method:– sentences under headings– sentences near beginning or end of document

and/or paragraphs (also [Baxendale 58])

Page 23: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Edmundson 69

• Linear combination of four features:

1C + 2K + 3T + 4L

• Manually labelled training corpus

• Key not important! 0 10 20 30 40 50 60 70 80 90 100 %

RANDOM

KEY

TITLE

CUE

LOCATION

C + K + T + L

C + T + L

1

Page 24: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Kupiec et al. 95

• Extracts of roughly 20% of original text• Feature set:

– sentence length• |S| > 5

– fixed phrases• 26 manually chosen

– paragraph• sentence position in paragraph

– thematic words• binary: whether sentence is included in manual extract

– uppercase words• not common acronyms

• Corpus:• 188 document + summary pairs from scientific journals

Page 25: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Kupiec et al. 95

• Uses Bayesian classifier:

• Assuming statistical independence:

k

j j

k

j j

kFP

SsPSsFPFFFSsP

1

121

)(

)()|(),...,|(

),(

)()|,...,(),...,|(

,...21

2121

k

kk FFFP

SsPSsFFFPFFFSsP

Page 26: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Kupiec et al. 95

• Performance:– For 25% summaries, 84% precision– For smaller summaries, 74% improvement over

Lead

Page 27: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

A typical modern supervised summarization system

• Or, what you could do if asked to do one …

Page 28: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Features

• Location– Absolute location of the sentence– Section structure: first sentence, last sentence,

other– Paragraph structure

• What section the sentence appeared in– Introduction, implementation, example,

conclusion, result, evaluation, experiment etc

Page 29: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

More features

• Sentence length– Very long and very short sentences are unusual

• Title word overlap• Tf.idf word content

– Binary feature– “yes” if the sentence contains one of the 18 most

important words– “no” otherwise

Page 30: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

More features

• Presence and type of citation• Formulaic expressions

– “in traditional approaches”, “a novel method for”

Page 31: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Problems with supervised methods for summarization

• Annotation is expensive– Here---relevance and rhetorical status judgments

• People don’t agree– So more annotators are necessary– And/or more training of the annotators

Page 32: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Unsupervised methods for (extractive) summarization: basic idea

• Compute word probability from input

• Compute sentence weight as function of word probability

• Pick best sentence

Page 33: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Sentence ranking options

• Based on word probability– S is sentence with length n– Pi is the probability of the i-

th word in the sentence–

• Based on word tf.idf

n

pSweight

n

ii

1

)log()(

n

idftfSweight

n

ii

1

.)(

Page 34: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Centrality measures

• How representative is a sentence of the overall content of a document– The more similar are sentence is to the document, the

more representative it is

ji

jii SSsimK

Scentrality ),(1

)(

Page 35: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Beyond word-based sentence extraction

• Discourse information– Resolve anaphora, text structure

• Use external lexical resources– Wordnet, adjective polarity lists, opinion

• Using machine learning

Page 36: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

42

• Claim: The multi-sentence coherence structure of a text can be constructed, and the ‘centrality’ of the textual units in this structure reflects their importance.

• Tree-like representation of texts in the style of Rhetorical Structure Theory (Mann and Thompson,88).

• Use the discourse representation in order to determine the most important textual units. Attempts:– (Ono et al., 94) for Japanese.– (Marcu, 97) for English.

The role of discourse structure

Page 37: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

43

Rhetorical parsing (Marcu,97)

[With its distant orbit {– 50 percent farther from the sun than Earth –} and slim atmospheric blanket,1] [Mars experiences frigid weather conditions.2] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles.3] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion,4] [but any liquid water formed that way would evaporate almost instantly5] [because of the low atmospheric pressure.6]

[Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop,7] [most Martian weather involves blowing dust or carbon dioxide.8] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap.9] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water.10]

Page 38: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

44

Rhetorical parsing (3)

5Evidence

Cause

5 6

4

4 5Contrast

3

3Elaboration

1 2

2BackgroundJustification

2Elaboration

7 8

8Concession

9 10

10Antithesis

8Example

2Elaboration

Summarization = selection of the most important units

2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6

Page 39: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Argumentative zoning

• What is the purpose of the sentence? To communicate – Background– Aim– Basis (related work)

• How can we know which sentence serves each aim?

Page 40: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Argumentative zones

Page 41: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization
Page 42: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Selecting important sentences (relevance)

• How well can it be performed by people?– Rather subjective; depends on prior knowledge and interests

• Even the same person would select 50% different sentences if she performs the task at different times

• Still, judgments can be solicited by several people to mitigate the problem

• For each sentence in at article---say if it is important and interesting enough to be included in a summary

Page 43: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Multi-document summarization

• Very useful for presenting and organizing search results– Many results are very similar, and grouping closely

related documents helps cover more event facets– Summarizing similarities and differences between

documents

Page 44: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Standard Approaches• Salient information = similarities

• Pairwise similarity between all sentences• Cluster sentences using similarity score (Themes)• Generate one sentence for each theme

– Sentence extraction (one sentence/cluster)– Sentence fusion: intersect sentences within a theme and choose the repeated

phrases. Generate sentence from phrases

• Salient information = important words• Important words are simply the most frequent in the document set• SumBasic simply chooses sentences with the most frequent words.

Conroy expands on this

Page 45: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

MEAD (Radev et al. 00)

• MEAD• Centroid-based• Based on sentence

utility

• Topic detection and tracking initiative [Allen et al. 98, Wayne 98]

TIME

Page 46: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

1. Algerian newspapers have reported that 18 decapitated bodies have been found by authorities in the south of the country.

2. Police found the ``decapitated bodies of women, children and old men,with their heads thrown on a road'' near the town of Jelfa, 275 kilometers (170 miles) south of the capital Algiers.

3. In another incident on Wednesday, seven people -- including six children -- were killed by terrorists, Algerian security forces said.

4. Extremist Muslim militants were responsible for the slaughter of the seven people in the province of Medea, 120 kilometers (74 miles) south of Algiers.

5. The killers also kidnapped three girls during the same attack, authorities said, and one of the girls was found wounded on a nearby road.

6. Meanwhile, the Algerian daily Le Matin today quoted Interior Minister Abdul Malik Silal as saying that ``terrorism has not been eradicated, but the movement of the terrorists has significantly declined.''

7. Algerian violence has claimed the lives of more than 70,000 people since the army cancelled the 1992 general elections that Islamic parties were likely to win.

8. Mainstream Islamic groups, most of which are banned in the country, insist their members are not responsible for the violence against civilians.

9. Some Muslim groups have blamed the army, while others accuse ``foreign elements conspiring against Algeria.’’

1. Eighteen decapitated bodies have been found in a mass grave in northern Algeria, press reports said Thursday, adding that two shepherds were murdered earlier this week.

2. Security forces found the mass grave on Wednesday at Chbika, near Djelfa, 275 kilometers (170 miles) south of the capital.

3. It contained the bodies of people killed last year during a wedding ceremony, according to Le Quotidien Liberte.

4. The victims included women, children and old men.

5. Most of them had been decapitated and their heads thrown on a road, reported the Es Sahafa.

6. Another mass grave containing the bodies of around 10 people was discovered recently near Algiers, in the Eucalyptus district.

7. The two shepherds were killed Monday evening by a group of nine armed Islamists near the Moulay Slissen forest.

8. After being injured in a hail of automatic weapons fire, the pair were finished off with machete blows before being decapitated, Le Quotidien d'Oran reported.

9. Seven people, six of them children, were killed and two injured Wednesday by armed Islamists near Medea, 120 kilometers (75 miles) south of Algiers, security forces said.

10. The same day a parcel bomb explosion injured 17 people in Algiers itself.

11. Since early March, violence linked to armed Islamists has claimed more than 500 lives, according to press tallies.

ARTICLE 18854: ALGIERS, May 20 (UPI) ARTICLE 18853: ALGIERS, May 20 (AFP)

Page 47: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

MEAD

• INPUT: Cluster of d documents with n sentences (compression rate = r)

• OUTPUT: (n * r) sentences from the cluster with the highest values of SCORESCORE (s) = Si (wcCi + wpPi + wfFi)

Page 48: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Scientific article summarization

• Not only what the article is about, but also how it relates to work it cites

• Determine which approaches are criticized and which are supported– Automatic genre specific summaries are more

useful than original paper abstracts

Page 49: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Other uses

• Document indexing for information retrieval

• Automatic essay grading, topic identification module

Page 50: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Evaluating summarization: the problem

• Which human summary makes a good gold standard? Many summaries are good

• At what granularity is the comparison made?

• When can we say that two pieces of text match?

Page 51: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Evaluation

• Many measures for extractive summarization– E.g., ROUGE

• New ones for abstractive summarization– E.g., Pyramids

Page 52: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Radev: Cluster-Based Sentence Utility

---S10---S9---S8---S7---S6---S5+--S4---S3+++S2-++S1

System 2System 1Ideal

9(+)67S4

432S3

8(+)9(+)8(+)S2

510(+)10(+)S1

System 2System 1Ideal

Summary sentence extraction method

CBSU method

CBSU(system, ideal)= % of ideal utility covered by system summary

Page 53: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Interjudge agreement

Judge1 Judge2 Judge3

Sentence 1 10 10 5

Sentence 2 8 9 8

Sentence 3 2 3 4

Sentence 4 5 6 9

Page 54: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Relative utility

Judge1 Judge2 Judge3

Sentence 1 10 10 5

Sentence 2 8 9 8

Sentence 3 2 3 4

Sentence 4 5 6 9

RU =

Page 55: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Relative utility

Judge1 Judge2 Judge3

Sentence 1 10 10 5

Sentence 2 8 9 8

Sentence 3 2 3 4

Sentence 4 5 6 9

17RU =

Page 56: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Relative utility

Judge1 Judge2 Judge3

Sentence 1 10 10 5

Sentence 2 8 9 8

Sentence 3 2 3 4

Sentence 4 5 6 9

13

17RU = = 0.765

Page 57: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

ROUGE: Recall-Oriented Understudy for Gisting Evaluation

Rouge – Ngram co-occurrence metrics measuring content overlap

Counts of n-gram overlaps between candidate and model summaries

Total n-grams in summary model

Page 58: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

ROUGE• Experimentation with different units of comparison: unigrams,

bigrams, longest common substring, skip-bigams, basic elements

• Automatic and thus easy to apply

• Important to consider confidence intervals when determining differences between systems– Scores falling within same interval not significantly different– Rouge scores place systems into large groups: can be hard to definitively

say one is better than another

• Sometimes results unintuitive: – Multilingual scores as high as English scores– Use in speech summarization shows no discrimination

• Good for training regardless of intervals: can see trends

Page 59: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

65

Pyramids• Human evaluation of content: Nenkova & Passonneau (2004)• based on the distribution of content in a pool of summaries• Summarization Content Units (SCU):

– fragments from summaries– identification of similar fragments across summaries

• “13 sailors have been killed” ~ “rebels killed 13 people”• SCU have

– id, a weight, a NL description, and a set of contributors• SCU1 (w=4) (all similar/identical content)

– A1 - two Libyans indicted– B1 - two Libyans indicted– C1 - two Libyans accused– D2 – two Libyans suspects were indicted

Page 60: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

66

Pyramids

• a “pyramid” of SCUs of height n is created for n gold standard summaries

• each SCU in tier Ti in the pyramid has weight i

• with highly weighted SCU on top of the pyramid

• the best summary is one which contains all units of level n, then all units from n-1,…

• if Di is the number of SCU in a summary which appear in Ti for summary D, then the weight of the summary is:

w=nw=n-1

w=1

n

iiDiD

1

Page 61: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

67

Pyramids score

• let X be the total number of units in a summary

• it is shown that more than 4 ideal summaries are required to produce reliable rankings

n

itt

iXTj )||(max

n

jii

n

jii TXjTiMax

11

|)|(||

MaxDScore /

Page 62: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

Human performance/Best sysPyramid Modified Resp ROUGE-SU4B: 0.5472 B: 0.4814 A: 4.895 A: 0.1722 A: 0.4969 A: 0.4617 B: 4.526 B: 0.1552~~~~~~~~~~~~~~~~~

14: 0.2587 10: 0.2052 4: 2.85 15: 0.139

Best system ~50% of human performance on manual metrics

Best system ~80% of human performance on ROUGE

Page 63: 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

ACKNOWLEDGMENTS

• Many slides borrowed from Ani Nenkova (Penn), Drago Radev (Uni Michigan) and Daniel Marcu (ISI)