Upload
zanna
View
29
Download
0
Embed Size (px)
DESCRIPTION
Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. University of British Columbia Vancouver, CANADA. Multi-Document Summarization of Evaluative Text. Giuseppe Carenini, Raymond T. Ng, Adam Pauls Computer Science Dept. - PowerPoint PPT Presentation
Citation preview
21/04/23 EACL 2006 1
Giuseppe Carenini, Raymond T. Ng, Adam Pauls
Computer Science Dept.University of British Columbia
Vancouver, CANADA
Multi-Document Summarization of Evaluative Text
21/04/23 EACL 2006 2
Giuseppe Carenini, Raymond T. Ng, Adam Pauls
Computer Science Dept.University of British Columbia
Vancouver, CANADA
Multi-Document Summarization of Evaluative Text
21/04/23 EACL 2006 3
Motivation and Focus
Large amounts of info expressed in text form is constantly produced News, Reports, Reviews, Blogs,
Emails…. Pressing need to summarize Considerable work but limited factual
info
21/04/23 EACL 2006 4
Our Focus
Evaluative documents (good vs. bad, right vs. wrong) about a single entity
●Customer reviews (e.g. Amazon.com)●Travel logs about a destination●Teaching evaluations●User studies (!)
.
.
.
21/04/23 EACL 2006 5
Our Focus
We want to do this:
“The Canon G3 is a great camera. . .”
“Though great, the G3 has bad menus. . .”
“I love the Canon G3! It . . .”
Most users liked the Canon G3. Even though some did not like the menus, many . . .
21/04/23 EACL 2006 6
Two Approaches
Automatic summarizers generally produce two types of summaries:1. Extracts: A representative subset of text
from the original corpus2. Abstracts: Generated text which contains
the most relevant info from the original corpus
21/04/23 EACL 2006 7
Two Approaches (cont'd)
Extracts-based summarizers generally fare better for factual summarization (c.f. DUC 2005)
But extracts aren't well suited to capturing evaluative info● Can't express distribution of opinions (‘some/all’)● Can't aggregate opinions either numerically or
conceptually
So we tried both
21/04/23 EACL 2006 8
Two Approaches (cont'd)
Extract-based approach (MEAD*): Based on MEAD (Radev et al. 2003)
framework for summarization Augmented with knowledge of
evaluative info (I'll explain later) Abstract-based (SEA):
Based on GEA (Carenini & Moore, 2001) framework for generating evaluative arguments about an entity
21/04/23 EACL 2006 9
Pipeline Approach (for both)
Evaluative Documents
Organization
Extraction of evaluative info
Organization of extracted info
Selection of extracted info
Presentation of extracted info
Selection of extracted info
Shared
21/04/23 EACL 2006 10
Extracting evaluative info
We adopt previous work of Hu & Liu (2004) (but many others exist . . .)
Their approach extracts:What features of the entity are
evaluatedThe strength and polarity of the
evaluation on the [ -3 ….. +3 ] intervalApproach is (mostly) unsupervised
21/04/23 EACL 2006 11
Examples
•“the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera ……”•“… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .”
21/04/23 EACL 2006 12
Feature Discovery
•“the menus are easy to navigate and the buttons are easy to use. it is a fantastic camera …”•“…… the canon computer software used to download , sort , . . . is very easy to use. the only two minor issues i have with the camera are the lens cap ( it is not very snug and can come off too easily). . . .”
21/04/23 EACL 2006 13
Strength/Polarity Determination
•“the menus are easy to navigate(+2) and the buttons are easy to use(+2). it is a fantastic(+3) camera …”•“…… the canon computer software used to download , sort , . . . is very easy to use (+3). the only two minor issues i have with the camera are the lens cap ( it is not very snug (-2) and can come off too easily (-2))...”
21/04/23 EACL 2006 14
Pipeline Approach (for both)
Evaluative Documents
Organization
Extraction of evaluative info
Organization of extracted info
Selection of extracted info
Presentation of extracted info
Selection of extracted info
Shared
Partially shared
21/04/23 EACL 2006 15
Organizing Extracted Info
Extraction provides a bag of features But
features are redundant features may range from concrete and
specific (e.g. “resolution”) to abstract and general (e.g. “image”)
Solution: map features to a hierarchy [Carenini, Ng, & Zwart 2005]
21/04/23 EACL 2006 16
Canon G3 Digital Camera
User Interface
Buttons MenusLever
Convenience
Menu
Battery Life
Battery
Battery Charging System
. . .
.
.
.
Feature Ontology
[-1,-1,+1,+2,+2,+3,+3, +3]
[+1]
[+1] [+2,+2,+2,+3+3]
[-1,-1,-2]
“canon”“canon g3”
“digital camera”
21/04/23 EACL 2006 17
Organization: SEA vs. MEAD*
SEA operates only on the hierarchical data and forgets about raw extracted features
MEAD* operates on the raw extracted features and only uses hierarchy for sentence ordering (I'll come back to this)
21/04/23 EACL 2006 18
Pipeline Approach (for both)
Evaluative Documents
Organization
Extraction of evaluative info
Organization of extracted info
Selection of extracted info
Presentation of extracted info
Selection of extracted info
Shared
Partially shared
Not shared
21/04/23 EACL 2006 19
Feature Selection: SEA
node leaf if)(_
otherwise)()1()(_ )(
)(
ifmoidir
fmoii
fmoidirfmoi
ik fchildrenfk
i
We define a measure of importance (moi) for each feature fi in
the hierarchy of features2)(_ k
PSpsi psfmoidir
ik
psk
Canon G3 Digital Camera
User Interface Convenience[+1]
[-1,-1,+1,+2,+2,+3,+3, +3]
21/04/23 EACL 2006 20
Selection Procedure
=> Dynamic greedy selection:
Until desired number of features is selected• Most important node is selected• That node is removed from the tree• Importance of remaining nodes is recomputed
• Straightforward greedy selection would not work • if a node derives most of its importance from its child(ren) including both the node and the child(ren) would be redundant
Similar to redundancy reduction step in many automatic summarization algorithms
21/04/23 EACL 2006 21
Feature Selection: MEAD*
MEAD* selects sentences, not features
Calculate score for each sentence si with
the menus are easy to navigate(+2) and the buttons are easy to use(+2).
feature(si) ps
k
ik sps
ki pssscore )(
Break ties with MEAD centroid (common feature in multi-document summarization)
21/04/23 EACL 2006 22
Feature Selection: MEAD*
We want to extract sentences for most important features, and only one sentence per feature
Put each sentence in “bucket” for each feature(si)
the menus are easy to navigate(+2 ) and the buttons are easy to use(+2 ).
menus buttons
I like the menus . . .
21/04/23 EACL 2006 23
Feature Selection: MEAD*
Take the (single) highest scoring sentence from the “fullest” buckets until desired summary length is reached
21/04/23 EACL 2006 24
Pipeline Approach (for both)
Evaluative Documents
Organization
Extraction of evaluative info
Organization of extracted info
Selection of extracted info
Presentation of extracted info
Selection of extracted info
Shared
Partially shared
Not shared
Not shared
21/04/23 EACL 2006 25
Presentation: MEAD*
Display selected sentences in order from most general (top of feature hierarchy) to most specific
That's it!
21/04/23 EACL 2006 26
Presentation: SEA
SEA (Summarizer of Evaluative Arguments) is based on GEA (Generator of Evaluative Arguments) (Carenini & Moore, 2001)
GEA takes as input a hierarchical model of features for an entity objective values (good vs. bad) for each feature of
the entity
Adaptation is (in theory) straightforward
21/04/23 EACL 2006 27
Possible GEA Output
The Canon G3 is a good camera. Although the interface is poor, the image quality is excellent.
21/04/23 EACL 2006 28
Target SEA Summary
Most users thought Canon G3 was a good camera. Although, several users did not like interface, almost all users liked the image quality.
21/04/23 EACL 2006 29
Extra work
● What GEA gives us:– High-level text plan (i.e. content selection
and ordering)– Cue phrases for argumentation strategy
(“In fact”, “Although”, etc.)● What GEA does not give us:
– Appropriate micro-planning (lexicalization)
● Need to give indication of distribution of customer opinions
21/04/23 EACL 2006 30
Microplanning (incomplete!)
We generate one clause for each selected feature
Each clause includes 3 key pieces of information:1. Distribution of customers who evaluated the feature
(“Many”, “most”, “some” etc.)2. Name of the feature (“menus”, “image quality”,
etc.)3. Aggregate of opinions (“excellent”, “fair”, “poor”,
etc.)→“most users found the menus to be poor”
21/04/23 EACL 2006 31
Microplanning
Distribution is (roughly) based on fraction of customers who evaluated the feature (+ disagreement . . . )
Name of the feature is straightforward Aggregate of opinions is based on a
function similar in form to the measure of importance average polarity/strength over all
evaluations rather than summing
21/04/23 EACL 2006 32
Microplanning
We “glue” clauses together using cue phrases from GEA
Also perform basic aggregation
21/04/23 EACL 2006 33
Formative Evaluation
Goal: test user’s perceived effectiveness
Participants: 28 ugrad students
Procedure Pretend worked for manufacturer
Given 20 reviews (from either Camera or DVD corpus) and asked to generate summary (~100 words) for marketing dept
After 20 mins, given a summary of the 20 reviews
Asked to fill out questionnaire assessing summary effectiveness (multiple choice and open form)
21/04/23 EACL 2006 34
Formative Evaluation (cont'd)
Conditions: User given one of 4 summaries1.Topline summary (human)2.Baseline summary (vanilla MEAD)3.MEAD* summary4.SEA summary
21/04/23 EACL 2006 35
Quantitative ResultsResponses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
21/04/23 EACL 2006 36
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 37
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 38
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 39
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 40
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 41
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 42
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 43
Quantitative Results
SEA MEAD* DUC (median) MEAD HumanGrammaticality 3.43 2.71 3.86 3.14 4.29Non-redundancy 3.14 3.86 4.44 3.57 4.43Referential clarity 3.86 4.00 2.98 3.00 4.71Focus 4.14 3.71 3.16 2.29 4.14Structure 2.29 3.00 2.10 1.86 4.43Linguistic Avg. 3.37 3.46 3.31 2.77 4.40Recall 2.33 2.57 1.57 3.57Precision 4.17 3.50 2.17 3.86Accuracy 4.00 3.57 2.57 4.29Content Avg. 3.50 3.21 2.10 3.90Overall 3.14 3.14 2.14 4.43Macro Avg. 3.39 3.34 2.48 4.24
Responses on a scale from 1 (Strongly disagree) to 5 (Strongly agree)
21/04/23 EACL 2006 44
Qualitative Results: MEAD*
Surprising: many participants didn't notice or didn't mind verbatim text extraction
Two major complaints about content1.Summary was not representative (negative
sentence extracted even though majority were positive)
2.Evaluations of some features were repeated (2) could be addressed, but (1) can only
partially be fixed with pure extraction
21/04/23 EACL 2006 45
Qualitative Results: SEA
Some complaints about “robotic” feel of summary, and about repetition/lack of pronouns➔ Need to do more complex microplanning
Some wanted more details (which “manual features . . . “)● Note: this complaint absent with MEAD*
Some disagreed with feature selection (precision/recall), but this is a problem even with human summaries
21/04/23 EACL 2006 46
Conclusions
Extraction works surprisingly well even for evaluative summarization
Topline > MEAD* SEA > Baseline Need to combine strengths of SEA and
MEAD* for evaluative summarization● Need detail, variety, and natural-sounding text provided by extraction
● Need to generate opinion distributions● Need argument structure from SEA (?)
21/04/23 EACL 2006 47
Other Future Work
Automatically induce feature hierarchy Produce summaries tailored to user
preferences of the evaluated entity Summarize corpora of evaluative
documents about more than one entity
21/04/23 EACL 2006 48
ExamplesMEAD*: Bottom line , well made camera , easy to use, very flexible and powerful features to include the ability to use external flash and lense / filters choices . It has a beautiful design , lots of features, very easy to use , very configurable and customizable , and the battery duration is amazing! Great colors , pictures and white balance. The camera is a dream to operate in automode , but also gives tremendous flexibility in aperture priority , shutter priority, and manual modes . I ’d highly recommend this camera for anyone who is looking for excellent quality pictures and a combination of ease of use and the flexibility to get advanced with many options to adjust if you like.
21/04/23 EACL 2006 49
ExamplesSEA: Almost all users loved the Canon G3 possibly because some users thought the physical appearance was very good. Furthermore, several users found the manual features and the special features to be very good. Also, some users liked the convenience because some users thought the battery was excellent. Finally, some users found the editing/viewing interface to be good despite the fact that several customers really disliked the viewfinder . However, there were some negative evaluations. Some customers thought the lens was poor even though some customers found the optical zoom capability to be excellent.Most customers thought the quality of the images was very good.
21/04/23 EACL 2006 50
Examples
MEAD: I am a software engineer and am very keen into technical details of everything i buy , i spend around 3 months before buying the digital camera ; and i must say , g3 worth every single cent i spent on it . I do n’t write many reviews but i ’m compelled to do so with this camera . I spent a lot of time comparing different cameras , and i realized that there is not such thing as the best digital camera . I bought my canon g3 about a month ago and i have to say i am very satisfied .
21/04/23 EACL 2006 51
ExamplesHuman: The Canon G3 was received exceedingly well. Consumer reviews from novice photographers to semi-professional all listed an impressive number of attributes, they claim makes this camera superior in the market. Customers are pleased with the many features the camera offers, and state that the camera is easy to use and universally accessible. Picture quality, long lasting battery life, size and style were all highlighted in glowing reviews. One flaw in the camera frequently mentioned was the lens which partially obstructs the view through the view finder, however most claimed it was only a minor annoyance since they used the LCD screen.
21/04/23 EACL 2006 52
Microplanning
We “glue” clauses together using cue phrases from GEA “Although”, “however”, etc. indicate opposing
evidence “Because”, “in particular”, indicate supporting
evidence “Furthermore” indicates elaboration
Also perform basic aggregationmost users found the menus to be poor
most users found the buttons to be poor
most users found the menus and buttons to be poor