Upload
quinsulon-israel
View
213
Download
4
Embed Size (px)
DESCRIPTION
Dissertation defense slides for "Semantic Analysis for Improved Multi-document Text Summarization".
Citation preview
Ph. D. Dissertation DefenseSemantic Analysis for Improved Multi-
Document Summarization
Quinsulon L. Israel
Committee Members:
Dr. Il-Yeol Song (Chair)
Dr. Hyoil Han (Co-chair)
Dr. Jung-Ran Park
Dr. Harry Wang
Dr. Erjia Yan
1
Overview Motivation
Background
Research Goals
Literature Review
Methodology› Approach 1 - MDS by Aggregate SDS via Semantic Linear Combination
› Approach 2- MDS by Semantic Triples Clustering with Focus Overlap
Evaluation
Results
Conclusion
Further Work
2
Motivation
▪ What is automatic focused Multi-Document Text Summarization (fMDS)?
‒ Automatic text summarization: creation of a gist of text by an artificial system
‒ Multi-document summarization: automatic summarization of multiple, yet related documents
‒ fMDS: multi-document summarization focused on some input given to an artificial system
3
▪ Why automatic focused Multi-Document text Summarization (fMDS)?
‒ Purpose: Information overload reduction of multiple, related documents according to an inputted focus (i.e. query, topic, question, etc.)
‒ Use: Quick overviews of news and reports by analysts that focus on specific details and/or new information
‒ How: Extract subset of sentences from multiple, related text sources, while maximizing “informativeness” and reducing redundancy in the new summary
4
Motivation (cont.)
5
Motivation (cont.)
“Government Analyst”
Hypothesis
The use of semantic analysis can improve focused multi-document summarization (fMDS) beyond the use of baseline sentence features.
› Semantic analysis here uses light-weight semantic triples to help both represent and filter sentences.
› Semantic analysis also uses assigned weights given to the semantic classes (e.g. special NER typed as person, organization, location, date, etc.)
› Semantic analysis also uses “semantic cues” for identifying important information
6
Motivation (cont.)
7
What effects does semantic analysis of sentences have
on the improvement of focused multi-document
summarization (fMDS)?
› What is the effect on overall system performance of clustering
sentences based on semantic analysis for improving fMDS?
› What is the effect on overall system performance of using the
semantic class scoring of sentences for improving fMDS?
Motivation (cont.)
Research Question & Sub-questions
Research Goals
› Improve upon the gold standard baseline
› Examine the use of the new “semantic class scoring” with “semantic cue scoring”
› Examine the use of the new “semantic triples clustering” methods for extractive fMDS
› Create a portable, light-weight improvement for fMDS that can be easily modified
8
Background Human Summarization Activity
› Single document summarization (SDS): 79% “direct match” to a sentence within the source document(Kupiec, Pedersen et al. 1995)
› Multiple document summarization (MDS): use 55% of the “vocabulary” contained within source documents(Copeck and Szpakowicx 2004)
Man vs. Machine› “Highly frequent content words” from corpus not
found in automatic summaries during evaluation but appear in human summaries(Nenkova, Vanderwende et al. 2006)
› Man and machine have difficulties with generic MDS(Copeck and Szpakowicx 2004)
9
Human Summarization Agreement› SDS: 40% unigram overlap
(Lin and Hovy 2002)
− Humans tend to agree on “highly frequent content words”(Nenkova, Vanderwende et al. 2006)
− Words not agreed upon may not be highly frequent but may still be useful
› SDS: 30-40 summaries before consensus
› MDS: No such human studies found within literature
10
Background (cont.)
Focus Processing
Sentence Scoring and Ranking
Sentence Selection(into summary)
Redundancy Removal
Sentence Compression
Sentence Processing
Summary Truncation
Figure 1.
Multi-phase process• Process initial focus and document sentences
• Score and rank focus-salient sentences
• Add sentences to summary until pre-determined length
* Compression is optional .* Sentence scoring and ranking can be an iterative process.
Focused MDS (fMDS) Process
Focus-to-Sentences Comparison
11
Standard Summarization System
Optional
System deviates
Background (cont.)
Semantic Triples Parsing(subject-verb-object)
Sentence Scoring
(semantic classes, semantic
cues into aggregate score,
query overlap)
Conceptual Representations
(semantic triples clustering)
Sentence Processing(sentence splitting, tokenization, POS, NER,
phrase detection)
12
Text Summary
Text Collection
Research System
< GATE 7 Toolkit
< MultiPax Plugin
Figure 2
Semantic Annotation(person, location, organization)
Novel features of system
< Developed
LightSemantic >Component
Improvements >
Background (cont.)
SentenceSelection
(STC cluster representative)
< GATE 7 Toolkit
Semantic >
Components >
Overview
Summarization Timeline
Systems Comparison
› Probability/statistical modeling
› Features combination
› Multi-level text relationships
› Graph-based
› Semantic analysis
13
Literature Review
Summarization Timeline
14
1958
1968
1973
1977
1982
1988
1989
1979
1990-1999
2000
Lexical
occurrence
statisticsby Luhn
1961
Linguistic
approaches
Position,
cue wordsby Edmundson
“Cohesion
streamlining”By Mathais
Frames,
semantic
networksTOPIC system
Logic rules,
generativeSUSY system
“Sketchy
scripts”
(templates)FRUMP system
Hybrid
representations,
corpus-basedSCISOR system
Return of
occurrence
statistics“Ranaissance” era
Statistics,
corpus training“State-of-the-art”
era
2010-2011
Deeper
semantic,
structural
analyses
Literature Review (cont.)
MDS
2004
15
Author YearStatistical
ApproachesFeatures
CombinationGraph-based
Multi-level Text
Relationships
Semantic Analysis
Conroy 2006 x
Nenkova 2006 x
Arora 2008 xYih 2007 xOuyang 2007 xWan 2008 x
Wei 2008 x
Wang 2008 x
Harabagiu 2010 x
System Categories
Literature Review (cont.)
Because systems report different evaluation measures or use
different datasets, normalizing performances across years is
not possible.
16
Statistical Approaches
Literature Review (cont.)
Authors Year Compress Mat Calc Freq/Prob LDA Summary Op
Conroy 2006 x x xNenkova* 2006
Arora 2008 x x
Focused Uses some focusing input to system
Compress Uses some form of simplification to add more information
Mat Calc Uses complex matrix calculations to filter and select sentences
Freq/Prob Uses statistical or probability distribution method to score terms
LDA Uses complex Latent Dirichlet Allocation to model summaries
Summary Op Uses a method of creating multiple summaries and choosing the optimum summary.
* Not focus-based, but still important
Legend
Authors Year Advantages Disadvantages
Conroy 2006 Uses likely human vocabularyUses other collections external to the one
observed
Nenkova 2006 Simple yet relatively effective
Reports frequency based indicator of human vocabulary but uses probability
instead
Arora 2008Captures fixed topics from corpus,
optimizes summary
Very complex, relies on sampling overcorpus, sentence can represent only one
topic
17
Features Combination
Literature Review (cont.)
Authors Published Log Reg Word Pos Summary Op Freq/Prob Sentence Pos NER Count WordNet
Yih 2007 x x x x
Ouyang 2007 x x x x
Log Reg Uses logistic regression to tune a scoring estimator
Word Pos Adds word position metric to score
Freq/Prob Uses statistical or probability distribution method to score terms
Sentence Pos Adds sentence position metric to score
Summary Op Uses a method of creating multiple summaries and choosing the optimum summary.
NER Count Counts named entities found and adds to score
WordNet Used to determine semantically related words
Legend
Authors Year Advantages Disadvantages
Yih 2007Simple estimated scoring,
optimizes summaryUses only two sentence features, no
comparison of meaning between words
Ouyang 2007Determines most important
featuresSemantics between words in focus and
sentence compared arbitrarily
18
Graph-based Approaches
Literature Review (cont.)
Authors Published Bi-partite CM Rand Walk
Wan 2008 x x
Bi-partite Uses bi-partite link graph (between clusters and sentences)
CM Rand Walk Conditional Markov Random Walk done between nodes in graph
Legend
Authors Year Advantages Disadvantages
Wan 2008
Introduces link analysis via modified Google PageRank to
MDS
Uses only cosine similarity for all values, thus no comparison of meaning between
words
19
Multi-level Text Relationships
Literature Review (cont.)
Authors Published Mat Calc Pairwise WordNet
Wei 2008 x x x
Mat Calc Uses complex matrix calculations to filter and select sentences
Pair-wise Compares pairs of text units closely for determining score
WordNet Used to determine semantically related words
Legend
Authors Year Advantages Disadvantages
Wei 2008
Introduces affinity relationship between text unit levels, text units paired intersected query
reduces noise
Very complex, need better formulation of creating vectors for singly observed terms
(e.g. too much noise from WordNetwithout better constraint)
20
Semantic Analysis
Literature Review (cont.)
Authors Published Mat Calc Pairwise Structure Coherence
Wang 2008 x x
Harabagiu 2010 x x
Mat Calc Uses complex matrix calculations to filter and select sentences
Pair-wise Compares pairs of text units closely for determining score
Structure Adds ordering and/or proximity of text units to scoring
Coherence Attempts to improve readability of summary
Legend
Authors Year Advantages Disadvantages
Wang 2008Uses semantic analysis, text units
paired reduces noiseComplex matrix reduction, performance
only above average system
Harabagiu 2010Complete semantic parsing, adds
coherence for improvement
heavy corpus training/machine learning, internally created KB, not easily replicable,
fMDS vs generic MDS,
Similar Research
Wang et al. 2008 vs. Proposed Research
21
Wang Harabagiu Research System
Complex Matrix Factorization
Corpus-trained, hand-crafted
kb
Semantic Triples Clustering
Redundancy Removal
Semantic frames overlap, complex
SNMFHeavy, all
argument rolesLight-weight, simpler
semantic triples
Scoring
Sentence-to-sentence semantic relationship more important than
focus
Position, ordering
Semantic class scoring, semantic cues scoring
Training None Extensive None
Literature Review (cont.)
22
Approach 1 - fMDS by Aggregated SDS via Semantic Linear Combination
− Applies to:
What is the effect on overall system performance of using the semantic class scoring of sentences for improving fMDS?
• Stage 1 Algorthm: SDS via Semantic Linear Combination (SLC)› Uses the combination of a feature set to create an aggregate score
› Introduces semantic class and semantic cues scoring to the feature set
• Stage 2 Algorithm: fMDS by SDS via SLC› Uses all the scored sentences from Stage 1
› Introduces redundancy removal via cosine similarity
Approach 2 - fMDS by Semantic Triples Clustering and Aggregated SDS via SLC
› Uses only the aggregate scores from Approach 1
› Introduces semantic triples clustering for redundancy removal and sentence selection
− Applies to:
What is the effect on overall system performance of clustering sentences based on semantic analysis for improving fMDS?
Approach (cont.)
23
Approach 3 - fMDS MDS by Semantic Triples Clustering with Cluster Connections
› Uses the aggregate scores from Approach 1
› Uses the semantic triples clustering from Approach 2
› Introduces sentence intra- and inter-connectivity for redundancy removal and sentence selection
− Applies to:
What is the effect on overall system performance of measuring the conceptual connectivity of sentence triples for improving
fMDS?
Approach (cont.)
Approach 1
Stage 1 Algorithm: SDS via Semantic Linear Combination
Input: A Corpus (C) of topically related Documents (D) pre-processed into Sentences (S) by which shallow semantic analysis has been performed: the Named Entities (NE) have been labeled externally by GATE ANNIE.
Output: A summary (SUMM) is a subset of Sentences (S) from the input corpus documents up to a maxLength (i.e., SUMM = {S1, S2, ... , SN}, where N is the maximum number of sentences that could be added to the summary). SUMM contains the best sentences toward a single-document summary.
24
Approach (cont.)
25
newswire documents
Focus: “Airbus A380”
Evaluation
Test Collection
System 100 (MDS on top of Semantic SDS Linear Combination [Semantic MDS])
Here are some key dates in its development: January 23, 2002: Production starts of Airbus A380 components.
May 7, 2004: French Prime Minister Jean-Pierre Raffarin inaugurates the Toulouse assembly line.
Ravenel said sound levels near Charles de Gaulle airport normally reached about 40 decibels.
According to the source, Wednesday's flight may be at an altitude slightly higher than the some 10,000 feet (3,000
meters) achieved in the first flight, and could climb up to 13,000 feet.
2: AFP_ENG_20050116.0346
A 380 'superjumbo' will be profitable from 2008 : Airbus chief
PARIS , Jan 16
The A 380 'superjumbo', which will be presented to the world in a lavish ceremony in southern France on Tuesday , will be
profitable from 2008 , its maker Airbus told the French financial newspaper La Tribune .
"You need to count another three years ," Airbus chief Noel Forgeard told Monday 's edition of the newspaper when asked
when the break-even point of the 10 - billion-euro-plus ( 13 - billion-dollar-plus ) A 380 programme would come .
So far , 13 airlines have placed firm orders for 139 of the new planes , which can seat between 555 and 840 passengers
and which have a catalogue price of between 263 and 286 million dollars ( 200 and 218 million euros ) .
The break-even point is calculated to arrive when the 250 th A 380 is sold .
6: AFP_ENG_20050427.0493
Paris airport neighbors complain about noise from giant Airbus A 380
TOULOUSE , France , April 27
An association of residents living near Paris 's Charles-de- Gaulle airport on Wednesday denounced the noise pollution
generated by the giant Airbus A 380 , after the new airliner 's maiden flight .
French acoustics expert Joel Ravenel , a member of the Advocnar group representing those who live near Charles de Gaulle ,
told AFP he had recorded a maximum sound level of 88 decibels just after the aircraft took off from near the southwestern city
of Toulouse .
The figure makes the world 's largest commercial jet "one of the loudest planes that will for decades fly over the heads of the
four million people living in the area" outside Paris , Advocnar said in a statement .
Ravenel said sound levels near Charles de Gaulle airport normally reached about 40 decibels .
Journalists watching the Airbus A 380 's first flight at Toulouse airport in southwestern France , however , noted how quiet the
take-off and landing had seemed .
Tens of thousands of spectators cheered as the A 380 touched down at the airport near Toulouse , home of the European
aircraft maker Airbus Industrie , after a test flight of three hours and 54 minutes .
1
Input
Output
Flow of Approach 1 Stage 1 Algorithm: SDS via Semantic Linear Combination 26
Approach (cont.)
27
Approach (cont.)
Stage 1 Algorithm: Feature set of Step 10
Aggregated Score: Formula…
Aggregate Score = ∑i Є F ɷi * fi, where fi is one of the features
described in section 5.1, i is the number of the feature, ɷi is the
weight of fi, and F is the feature set that this research use.
Flow of Approach 1 Stage 1 Algorithm: SDS via Semantic Linear Combination 28
Approach (cont.)
Approach 1
Stage 2 Algorithm: MDS By SDS via Semantic Linear Combination
• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis.
• Output: A summary (SUMMm) is a subset of Sentences (S) from the input documents up to maxLength (i.e., SUMMm= {Sx
1, Sy2, ... , S
zN},
where m refers to multi-document and x, y, and z identifies its containing document).
29
Approach (cont.)
30
Approach (cont.)
Flow of Approach 1 Stage 2 Algorithm: MDS via SDS Semantic Linear Combination
Approach 2
Algorithm: STC Focused MDS By SDS via Semantic Linear Combination
• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis of Approach 1 and pre-processed for their subject-verb-object triples. Stage 2 MDS of Approach 1 is not used as part of this approach.
• Output: A summary (SUMMstc) is a subset of Sentences (S) from the input corpus documents up to maxLength (i.e., SUMMm= {Sx
1, Sy
2, ... , SzN}, where x, y, and z identifies its containing document).
31
Approach (cont.)
32
According to police, the violence erupted after two boys, aged 14 and 16, died when they scaled a wall of an electrical relay station and fell against a transformer.
<Sentence s-v-o=“erupt:violence:*:f;die:boy:*:f;scaled:they:wall:f;”></Sentence>
erupt
violence *
die
boy *
triple 1scale
they wall
Example Sentence
Approach (cont.)
triple 2 triple 3
Represents examples of sentences transformed into semantic
triples.
The circle node represents the verb, the first square node
represents the subject, and the last square node represents the
object (direct) if found.
Approach 2 Algorithm: Example of Step 1
Semantic Triples
Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC33
Approach (cont.)
agg
34
The riot has spread to 200 city
suburbs and towns, including
Marseille, Nice, Toulouse, Lille,
Rennes, Rouen, Bordeaux and
Montpellier and central Paris,
French police said.
_ Nov. 4 _ Youths torch 750 cars,
throw stones at paramedics, as
violence spreads to other towns.
say
Police *
spread
riot *
spread
riot *
throw
youth Stone
spread
riot *
*Yellow triple (at top left) signifies the main cluster semantic triple
2 semantic triples
Rioting spreads to at least 20 Paris-region
towns.
Approach (cont.)
Approach 2 Algorithm: Example of Step 2
2 semantic triples
1 semantic triple
Semantic Triple Cluster Representation
Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC35
Approach (cont.)
agg
36
The riot has spread to 200 city
suburbs and towns, including
Marseille, Nice, Toulouse, Lille,
Rennes, Rouen, Bordeaux and
Montpellier and central Paris,
French police said.
Higher ranked triple (contained
in sentence with high triple count)
Semantic Triple Cluster Representation
say
Police *
spread
riot *
spread
riot *
2 semantic triples
Rioting spreads to at least 20 Paris-region
towns.
Approach (cont.)
Approach 2 Algorithm: Example of Step 3
1 semantic triple
*Yellow triple (at top left) signifies the main cluster semantic triple
Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC37
Approach (cont.)
agg
38
The riot has spread to 200 city
suburbs and towns, including
Marseille, Nice, Toulouse, Lille,
Rennes, Rouen, Bordeaux and
Montpellier and central Paris,
French police said.
Query: “Paris Riots”
say
Police *
spread
riot *
spread
riot *
*Yellow triple (at top left) signifies the main cluster semantic triple
2 semantic triples
Rioting spreads to at least 20 Paris-region
towns.
Approach (cont.)
Approach 2 Algorithm: Example of Step 5
1 semantic triple
Semantic triple overlap = 1
Semantic triple overlap = 1
Query Overlap with the Semantic Triples
Semantic Triple Cluster Representation
Flow of Approach 2 Algorithm: STC fMDS By SDS via SLC39
Approach (cont.)
agg
Approach 3
Algorithm: STC Focused MDS By SDS via Semantic Linear Combination
• Input: A corpus (C) of topically related documents (D) pre-processed into the best sentences (S) from the Stage 1 SDS by Shallow Semantic Analysis of Approach 1 and pre-processed for their subject-verb-object triples. Stage 2 MDS of Approach 1 is not used as part of this approach. In addition to the Stage 1 SDS processing from Approach 1, Approach 2 Steps 1-6 are used to collect the triples into their proper ordering, and the sentences are later ordered by the connections between these triples..
• Output: A summary (SUMMconn) is a subset of Sentences (S) from the input corpus documents up to maxLength (i.e., SUMMconn= {Sx
1, Sy
2, ... , SzN}, where x, y, and z identifies its containing document)..
40
Approach (cont.)
41
Approach 3
Approach (cont.)
42
Approach 3
Approach (cont.)
Flow of Approach 3 Algorithm: STC fMDS By Connectivity43
Approach (cont.)
agg
44
Goal: To get a summary with a ROUGE score higher than the gold standard baseline system
To place well against the veteran automatic systems.
For the evaluation, the following methods were used in combination:
› Counting semantic classes and semantic cues to boost informative sentences
› Simpler semantic triples clustering method (including with focus)
Method:
Gather human reference summaries and automated system summaries from the NIST 2008
Text Analysis Conference competition
Use evaluation script from the competition to compare research system summaries against all
other automatic summaries
Compare extrinsic evaluation ROUGE scores with gold standard baseline system and other
automatic systems
Evaluation
45
newswire documents
Focus: “Paris Riots”
Data used in Evaluation:
› Input for each focus text is a collection of 10 newswire documents
› Each document has approximately 250-500 words for ~20 sentences
› Total input for each collection range from ~150-200 sentences
› Total documents 46 collections for a total of about 10,000 sentences
Evaluation
Example Input (truncated)
<DOC id="AFP_ENG_20051028.0154" type="story" >
<HEADLINE>
Riot rocks Paris suburb after teenagers killed
</HEADLINE>
<DATELINE>
CLICHY-SOUS-BOIS, France, Oct 28
</DATELINE>
<TEXT>
<P>
Dozens of youths went
on a rampage, burning vehicles and vandalising buildings in a tough
Paris suburb Friday in an act of rage following the death by
electrocution of two teenagers trying to flee police.
</P>
…
</DOC>
Test Collection
NIST Text Analysis Conference Data
46
Evaluation
Evaluation Metrics:
Three ROUGE metrics from the NIST competitions are used to evaluate the performance of the proposed system:
ROUGE-1, ROUGE-2, and ROUGE-SU4
ROUGE is an N-gram co-occurrence statistic between a candidate system
summary and a set of human model summaries. ROUGE-1 is calculated as
follows:
Reference calculation: Four (4) human model summaries created by judges
according to the NIST competition guidelines
Gold Standard: Lead Baseline System: Collects first four (4) lines of most recent
document
s є {Reference Summaries}
Σ Σ Countmatch(gramn)
Σ Σ Count(gramn)
s є {Reference Summaries}
ROUGE-1
47
Evaluation Metric: ROUGE-1
Unigram co-occurrence statistic between a system summary and a set of four
human reference summaries. ROUGE-1 is calculated as follows:
Evaluation
Semi-automatic summary vs. 1 Reference Summary
System Candidate Summary Sentence
Police detained 14 people Saturday after a second [night] of [rioting] that broke out in a working-class
[Paris] suburb following the deaths of two youths who were electrocuted while trying to evade police.
3 unigrams found
Human reference Summary Sentence
On successive [nights] the [rioting] spread to other parts of [Paris] and then to other cities.
16 unigrams total
ROUGE-1 = 3 / 16 = 0.1875
Unigrams: {night}, {rioting}, {Paris}
48
Evaluation
Evaluation Metric: Rouge-SU4
Bigram co-occurrence statistic that allows for four (4) words to be appear
between two (2) words as long as they are in the same sentence order with the
human reference summary
Semi-automatic summary vs. 1 Reference Summary
System Candidate Summary Sentence
Police detained 14 people Saturday after a second [night of rioting that broke out] in a working-class
[Paris] suburb following the deaths of two youths who were electrocuted while trying to evade
police.
1 skip bigram found
Human Reference Summary Sentence
On successive [nights the rioting spread to other] parts of [Paris] and then to other cities.
21 skip bigrams total
Skip Bi-grams: {night rioting}
ROUGE-SU4 = 1 / 21 = 0.04762
49
Results: Approach 1System Ranking by MDS via SDS Semantic Analysis Approach Variations
Discussion: Approach 1
• The poorer performances of Systems 5, 10R and 16R show that stop word removal is absolutely necessary for improvement, even with the semantic analysis. Without it, systems could not outperform the gold standard baseline system. Slight improvement is shown from the semantic analysis SDS-based MDS Systems 6 over its relative System 5.
• The improved ROUGE score of System 9 over System 16P shows some importance for adding more semantic cueing and semantic class scoring to the selection of sentences. The weights are similarly, but System 16P takes away from the semantic cueing and semantic class scoring and gives it to “df”, and hence the drop.
• Another related class of tested systems are those of Systems 9, 10R and 11H. These systems differ mostly on alternative frequency measures. System 11H use of the well-known tf*idf measure outperforms the pure “df” measure that the other two use. “tf*idf” along with the semantic cueing and semantic class scoring allowed system 11H to obtain a higher score than the gold standard baseline.
50
51
Results: Approach 2System Ranking by STC Approach Variations
Discussion: Approach 2• All systems displayed are instances of the STC MDS system, except System 0 and this work’s
System 100, which is fMDS by SDS SLC from Approach 1.
• Although its performance in singular was not as promising compared to the veteran systems, the addition of System 3100's semantic triples clustering greatly improves performance by more than 10 rankings over the gold standard baseline. System 3100 also used a minimum cluster density of 2 and a ranking method that gave preference to cluster aggregate score over the cluster density.
• Systems 1600, 1200, and 2700 show improvement over the automatic gold standard baseline with semantic triples clustering alone; however, each additional ranking method shows added improvement.
52
53
Results: Approach 3System Ranking by STC Connectivity Process
Discussion: Approach 3• Systems Conn1 and Conn2 show only slight improvement over the gold standard baseline
System 0 in Table 11, with System Conn2 performing the best with stop-word pruning of the clusters based on their semantic roles (i.e. if a stop-word was found within a subject, verb or object slot, it was removed from consideration).
• Because these system variations show a minor drop in performance against the gold standard baseline system in other values of rouge, the approach is not satisfactorily performant, but may be relevant for improvement in future work.
54
55
Conclusion
This work sought to answer the question of what effects does semantic analysis of sentences have on the improvement of focused multi-document summarization (fMDS)?
1. What is the effect on overall system performance of using the semantic classes scoring of sentences for improving fMDS?
Even though it was shown that tf*idf is extremely important in selecting the best sentences, there is a gap that is created that this semantic analysis starts to fill.
The semantic classes and semantic cues scoring still improved several places over the gold standard baseline system.
56
Conclusion
2. What is the effect on overall system performance of clustering sentences based on semantic analysis for improving fMDS?
This work’s System 3100 outperformed the gold standard baseline System 0 by over 10 places. This performance improvement is mainly attributed to the semantic analysis technique of filtering the sentences by clustering their semantic triples. The semantic triples represent the most basic "meaning" of the sentences during the filtering process.
However, a short drop in performance of two places was observed when attempting to focus the semantic triples themselves. This is possibly due to the absence of the focus terms within the main propositions of the sentences. Yet, they may appear somewhere else within the sentence. Using the query feature helped mitigate this effect for the semantic tripling clustering in system 3100; hence, the better performance without the query overlap determination.
57
Conclusion
2. What is the effect on overall system performance of measuring the conceptual connectivity of sentence triples for improving fMDS?
Unfortunately, the technique used for sentence intra- and inter-connectivity did not perform well enough against the gold standard baseline system. This approach was able to obtain slightly more vocabulary as denoted by its slightly higher rouge1 score, but other scores were slightly lower compared to the performance of the gold standard baseline system.
3. Important note: within all three semantic analysis approaches, no word sense
disambiguation (WSD) was performed. Even for terms within semantic roles across multiple triples that can be clustered together, their actual "meaning" may be different. It would be worthwhile to add WSD utilizing words that appear around each role that is discovered. This may help improve the accuracy of systems implemented in this research.
Contributions:
› Provides an improvement over the gold standard baseline by more than ten positions.
› Proposes “semantic triples clustering” along with “semantic class scoring” and “semantic cue scoring” as methods to improve extractive fMDS.
› Provides a comparison of the semantic analysis techniques on performance for fMDS that can be used later for new abstractive summarization.
› Created a light-weight, portable fMDS system with no training.
58
Conclusion
Significance:
Improved over the gold standard baseline The research provides a more comparable semantic analysis against fundamental techniques and a gold
standard baseline
The research outlined here provides a more comparable semantic analysis against fundamental techniques and a standard baseline
More domain-independent improvement due to no need for training Can be used as a new baseline and can tested easily on other corpora
Simpler, inexpensive than extensively corpus-trained ‘a prior’ systems Other veteran methods are too expensive and time consuming to reproduce
This research does not rely on extensive corpus training and building tailored, domain-dependent resources.
Does not have the associated cost in time.
Compressing the sentences into a basic form of meaning takes a step in the direction of an abstractive technique . The semantic triples used for this extractive fMDS can be modified to take a step in relatively unexplored area
of abstractive fMDS.
Humans tend to extract whole sentences from documents to create a summary, however they also shorten, move and/or infuse information depending upon importance and length.
59
Conclusion
Direct semantic triplet summaries
Weight dampening
Advanced semantic class analysis
60
Further Work
Conclusion
Publications
Submitted:
Israel, Quinsulon L., Hyoil Han, and Il-Yeol Song. Semantic analysis for focused multi-document summarization of text. Submitted to ACM Symposium on Applied Computing (SAC) 2015.
61
Israel, Quinsulon L., Hyoil Han, and Il-Yeol Song (2010). Focused multi-document summarization: human summarization activity vs. automated systems techniques. Journal of Computing Sciences in Colleges. 25(5): 10-20.
Publication PlanJournals
Fall 2014 Submit to Journal of Intelligent Information Systems:
› Covers the integration of artificial intelligence and database technologies to create next generation Intelligent Information Systems
› http://www.springer.com/computer/database+management+%26+information+retrieval/journal/10844
Submit to Information Processing Management:› Covers experimental and advanced processes related to information retrieval (IR) and a variety of information systems, networks, and contexts, as well as their implementations and related evaluation.
› http://www.journals.elsevier.com/information-processing-and-management
62
References• Conroy, J. M., J. D. Schlesinger, et al. (2006). Topic-focused multi-document summarization using an
approximate oracle score. Proceedings of the COLING/ACL on Main conference poster sessions. Sydney, Australia, Association for Computational Linguistics: 152-159.
• Dang, H. T. (2006). Overview of the DUC 2006. Document Understanding Conference.
• Edmundson , H. P. 1969. New Methods in Automatic Extracting. J. ACM 16, 2 (April 1969), 264-285.
• Harabagiu, S. and F. Lacatusu (2010). "Using topic themes for multi-document summarization." ACM Transactions on Information Systems 28(3): 1-47.
• Ouyang, Y., S. Li, et al. (2007). Developing learning strategies for topic-based summarization. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. Lisbon, Portugal, ACM: 79-86.
• Nenkova, A. and K. McKeown (2003). References to named entities: a corpus study. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2. Edmonton, Canada, Association for Computational Linguistics: 70-72.
• Yih, W.-T., J. Goodman, et al. (2007). Multi-document summarization by maximizing informative content-words. International Joint Conferences on Artificial Intelligence, Hyderabad, India.
• Wan, X. and J. Yang (2008). Multi-document summarization using cluster-based link analysis. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 299-306.
• Wang, D., T. Li, et al. (2008). Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 307-314.
• Wei, F., W. Li, et al. (2008). Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore, Singapore, ACM: 283-290.
63