[IEEE 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS) - Sanur Bali, Indonesia (2013.09.28-2013.09.29)] 2013 International Conference on

Abstract—Extractive summarization is a widely studied and fairly easy to implement technique. It works by choosing the most important parts of a document(s) as a summary. However, this can lead to a lack of coherence in the summary itself. In this study, the principle of continuity in Centering Theory is used to maintain the entity coherence between subsequent sentences obtained from extractive news summarizer. Simultaneously, the relative order of sentences belonging to the same source document is maintained. These two considerations are implemented as fitness functions for a genetic algorithm that is used to obtain the optimal ordering of sentences in the summary. Based on the results of our study involving human judgment, a weighted fitness function combining 75% continuity and 25% relative order yields the most acceptable sentence ordering.

I. INTRODUCTION UTOMATICtext summarization is a process of producing a simplified version of a document or

a set of documents about a specific topic while minimizing the omission of important points from the source(s). It has various applications, e.g. to produce document outlines, headlines of news articles, snippets in search engine results, obtaining answers for non-factoid question answering systems, or simply building a summary for any kind of documents[1]. Depending on the nature of the source documents, there are two variants of the summarization task: single and multiple document summarizations. For example, writing the headline or outline of a news article is an instance of single document summarization, whereas producing an answer for a non-factoid question may involve multiple document summarization[1].

A summary can be produced through abstractive summarization, i.e. generating a new text based on a re-representation constructed from the original document(s), or through extractive summarization,

which works by choosing the most important parts of the original document(s) [2]. Theoretically, abstractive summarization is able to produce more human-like summaries that are easy to understand, but the task of capturing a representation of the semantics of the source documents is an extremely challenging task [3]. Extractive summarization is a widely studied and fairly easy to implement technique. It can produce a list of important information from the source documents, but there are several problems with this approach, one of which is the lack of coherence between information appearing in the summary, especially concerning dangling anaphora [3]. In extractive summarization, there are two main considerations for building a summary, namely how to choose the most important information appearing in the source documents[4, 5]and how to present the chosen information in a coherent manner in the resulting summary[6, 7]. The former has received much attention in previous research, using both statistical and/or linguistic approaches. However, the latter problem of coherence remains largely unsolved. This problem arises since a summary is constructed from a list of information that usually consists of sentences with high relevance scores from various sources, often originating from different articles. Dangling anaphora and other coherence issues become an unavoidable problem. This paper seeks to explore how the results of extractive summarization can be further processed to yield a more coherent summary, i.e. one that is easier to understand thanks to a good flow of information delivery. Centering theory, a well-known coherence analysis method, is used to obtain the optimal ordering of sentences in the summary. Related work will be discussed in Section II, our proposed method is presented in Section III, followed by experiments and evaluation results in Section IV, and in the last Section we discuss our initial conclusions.

Improving Coherence by Reordering the Output of Extractive Summarization using Centering Theory

through Genetic Algorithm ArlisaYuliawati, RuliManurung

Laboratory of Information Retrieval Faculty of Computer Science, Universitas Indonesia

Email: [email protected], [email protected]

A

ICACSIS 2013 ISBN: 978-979-1421-19-5

213/13/$13.00 ©2013 IEEE

II. RELATED WORK

A. Extractive Summarization Extractive summarization works by calculating a

score for each sentence then taking N sentences with the highest scores. The process consists of three main steps as shown in Fig.1 [1].

The process starts by choosing the content of the summary (content selection), followed by the process of reordering the selected content (information ordering), and finished by realizing sentences as the final output from the summarization system (sentence realization). In multi document summarization, the process of content selection entails selection of the most salient points of each document, while removing redundant information. Information ordering is another important process in the pipeline, as a good sequence of information presentation can yield an easy to understand and readable summary. In single document summarization, sentences can be ordered according to their appearance in the source document. However, multi document summarization requires a more sophisticated treatment, since sentences come from different source documents. Previous work by Barzilay et al. uses chronological ordering to arrange the sentences within a summary based on temporal information [8]. This approach can be used when the source documents have event chronology information. Another approach used by Aksoy et al. calculates the relative position of each sentence to the original document, assuming that each document has the same flow of information [9]. The position of each sentence is calculated using a ratio of sentence length to the length of its original document. This approach depends on the quality of information flow in each document. Concerning the entity relationship, an approach by Trandabat tried to summarize Romanian folklore [10]. It identifies the main character of the story from the predicate argument owned by each entity, then chooses sentences that are related to the main character. Using centering theory, a concept for anaphora resolution, [6] uses this concept to choose highly related sentences based on semantic transition between each adjacent sentence pair. Another researcher [11] uses this concept to identify a pattern of semantic transitions in the summary by training onhuman-annotated summaries. However, the result shows that it is difficult to find a specific transition pattern to produce a good summary.

B. Coherence Analysis Coherence in a discourse refers to a text that is easy

to read and understand that can be obtained from

reordering a specific order [12]. It is also described as associated elements to form a unified whole in a discourse. A more specific coherence example is the relationship between sentences or entities in a discourse. This relation is known as local coherence [13].

Using local coherence, which is specified as entity coherence, [14]uses centering theory in a text structuring process. The research uses an evolutionary process to find a better order using the principle of continuity as the fitness function. The result shows that reordering sentences by only considering the principle of continuity in a pair of adjacent sentences by their entity relationship leads to a better result than other metrics that are experimented on.

III. METHODOLOGY A. Overview In this paper we try to apply sentence reordering on

the output of extractive summarization using the principle of continuity used by [14]. We use news document collections that are summarized using available summarization tools and reconstruct the order of sentences based on the entity relationships. The main challenge is that news documents differ from typical story-based documents that usually talk about specific characters that are clearly defined. News documents usually focus on the event, not the person or main character being talked in that news. Thus, based on the experiment result from [15], sentence reordering for multi document news summarization using both event and entity information outperforms the result using only entity information. By this description, we use the principle of continuity as the entity information, and relative order of each sentence to their respective original documents as event information to reconstruct the order of sentences resulting from extractive summarization2. Both considerations will be used as fitness functions for a stochastic search, specifically using genetic algorithms.

B. Principle of Continuity The principle of continuity describes a condition

where each utterance in the discourse refers to at least one entity in the preceding utterance [14]. In this study, we define an utterance as a sentence in a summary. The principle of Continuity is a part of Centering Theory, a concept for anaphora resolution. This concept is suitable for entity coherence analysis. The semantic transition between each adjacent sentence pair can be identified from entities in each utterance.

To be able to identify the semantic transition

2 In our experiments we use MEAD, a popular extractive multi

document summarization tool (http://www.summarization.com/mead) Fig. 1. Extractive Summarization Flow

ICACSIS 2013 ISBN: 978-979-1421-19-5

214

between a pair of sentences, we need to specify three main components, namely the forward looking center, the backward looking center, and the preferred center [1, 11, 12]. The forward looking center (Cf(Un)) is a list of entities in an utterance that are ordered by grammatical role. The intersection between two Cf(Un) of adjacent utterances with higher ranking is known as the backward looking center (Cb(Un)).This component describes an important entity mentioned in an utterance, and also mentioned in the following utterance. Lastly, the preferred center (Cp(Un)) is the center of an utterance or the highest ranking entity in Cf(Un). Once these components have been identified, there are four possible semantic transitions, i.e. Continue, Retain, Smooth-shift, and Rough-shift which are presented in Table 1. As an example, consider the two sentences below [1]:

1) John saw a beautiful Ford Falcon (U1) 2) He showed it to Bob (U2)

The underlined words are the entities found in each sentence. Assuming we have determined the grammatical role for each entity, we can then specify the three main components of Centering Theory as follows:

- Cf(U1) = {John, Ford Falcon}, - Cf(U2) = {He(John), it(Ford Falcon), Bob}, - Cp(U1) =Cp(U2) =John, - Cb(U1) = Ø, whereas Cb(U2) = John.

Subsequently, we obtain Cb(U1) = Ø and Cp(U1) = Cp(U2). Thus, based on Table 1, the semantic transition between the first sentence (U1) and the second sentence (U2) is Continue. This semantic transition is what we use to maintain the continuity between sentences in the summary. The principle of continuity is described by the following equation: Cf(Un-1) Cf(Un) Ø. This equation states that two adjacent utterances must be connected by some specific entity, as with the result shown in the above example.

C. Reordering by Evolutionary Process The main task in the implementation of Centering

Theory is entity recognition and anaphora resolution in each utterance. As seen in the example above, in U2 we find there are two pronouns in the sentence. Before identifying the components of Centering Theory, we first have to resolve the referent of each pronoun. Of course we need to recognize each entity in the sentence before performing anaphora resolution.

Fig. 2 shows the preparation process before carrying out the main sentence reordering process. To keep the referent of each referring entity, anaphora resolution is implemented before the summarization process, and stored for future use. Aside from that, each entity in the collection is annotated, and the subject and object of each sentence is identified. We use several features provided by GATE3 for these tasks, i.e. ANNIE NER to do the entity annotation, Pronoun Annotator and Pronominal Coreference for the anaphora resolution, and MultiPaX for subject-object identification. This annotation result must be accessible from the summarization system, so that the summarization result still keeps all information from the annotation process. This information is necessary to identify the forward, backward, and preferred centers.

The main evolutionary process using genetic algorithms starts from a set of the initial population. This population consists of N randomly generated orderings of sentences in the summary. Each order will be evaluated as to whether it fulfills the stopping criteria, e.g. whether it is an optimal ordering with respect to the chosen fitness function, or whether the maximum number of generations has been reached. While these requirements have not been met, genetic operators (mutation and crossover) are applied to obtain new evolved individuals that will be reevaluated until the stopping criterion is fulfilled. Fig. 3 illustrates the evolutionary process using the genetic algorithm. In our experiments we use ECJ4, a Java-based evolutionary computation library.

3http://gate.ac.uk/ 4http://cs.gmu.edu/~eclab/projects/ecj/

TABLE I SEMANTIC TRANSITION BASED ON BFP ALGORITHM

atau

Document Collection

Entity Annotation + Anaphora Resolution

Predicate Argument Extraction

Summarization

Summary(list of sentences)

Evolutionary Process

Fig. 2. Preparation Process

Fig. 3. Evolutionary Process

ICACSIS 2013 ISBN: 978-979-1421-19-5

215

1) Fitness Function The quality of a sentence ordering is denoted by its fitness value. The principle of continuity is the main consideration in the calculation of fitness value. The more continuity found in sentence pairs, the better the fitness of the individual (sentence ordering), as it means that there are more interrelated sentence pairs in the summary. Simultaneously, the relative order of sentences belonging to the same source document is maintained. The fitness function is shown in Equation(1).

(1)

In this equation, WCo is the weight for the continuity component, whereas WRe describes the weight for the relative ordering of sentences belonging to the same source document. NCo describes how many Continue transitions are found in an ordering, whereas NRe is the number of sentences belonging to the same source document whose relative ordering is preserved. Using this equation, we want to know which component is more important. The main questions are, is it enough to find a best ordering of sentences by only using the entity relationship among sentences, or is it important to preserve the relative ordering of sentences as they appear in the source documents? In our experiments, we try out various weighting schemes of WCo and WRe. 2) Genetic Operators We use both mutation and crossover genetic operators in the breeding process. While the best solution obtained from the evolutionary process is not yet optimal, the breeding process is applied to produce a new, evolved population of individuals, i.e. sentence orderings.

Mutation There are four mutation operators used in this study, they are: • Partial random mutation with consideration of

continuity. All sentence positions except those involved in a Continue transition are shuffled.

• Mutation based on continuity pairs and sentence relative order. Sentences are sorted based on their original relative ordering whilst maintaining sentence pairs involved in Continue transitions adjacent.

• Total random mutation. All sentences are randomly shuffled without any consideration.

• Partial random mutation. Only the ordering of sentences within a specific range are shuffled.

Crossover There are three main crossover operators in this study, they are: • Redundancy removal crossover. One point

crossover with redundancy removal and reinsertion of missing elements. In our experiments we employ several variants of this operator, i.e. left and right

redundancy removal, which removes redundant elements and reinserts missing elements from either the left or the right; and coin-flip redundancy removal, which randomly decides the direction of removal and reinsertion (from the left or right).

• Order crossover. Two parents contribute elements within a specific segment to their child, in the same position. The remaining unfulfilled positions are filled with elements obtained from cross parents, starting from the position after the specific segment.

• Crossover with penalty for redundancy element. A penalty is imposed to each redundancy in an order. From Equation (1), the nominator is subtracted by the number of redundancies in an order (NS), as is shown in Equation (2).

(2)

Together with the combination of the weighting of two fitness considerations, i.e. continuity and original sentence relative ordering, the various mutation and crossover operators are also considered as parameters to be experimented.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

A. Data Description There are three document collections used in this

study. The characteristics of each collection are described in Table II. The Gilbert Hurricane collection consists of 6 articles about a continuous hurricane in 1988 in some areas around the Caribbean Sea. The Gulf Air collection talks about an accident involving a Gulf Air Flight in 2000 in Bahrain. Lastly, the Taufik Hidayat collection contains news about the retirement of Taufik Hidayat, an Indonesian Badminton Player in June 2013. Each collection becomes the input data for the MEAD summarizer. We use 10 sentences as the target summary length, which is approximately 30% the length of the Gilbert Hurricane collection, 10% the length of the Gulf Air collection, and 27% the length of the Taufik Hidayat collection.

B. Experiments concerning GA Parameters The genetic algorithm as an evolutionary process has many parameters to be defined. To find the best combination of parameters, we held a preliminary experiment. The parameters being experimented are (i) the probability of mutation and crossover, (ii) the selection algorithm used to choose the parents, (iii) the number of individuals in a population, (iv) the maximum number of generations, and (v) the use of elitism. We experimented with 576 parameter combinations and chose the best combination based on the fitness value obtained. This best combination is subsequently used to choose the most appropriate mutation and crossover operators. There are 20 different combinations and once again we choose the combination of operators that yield the highest fitness

ICACSIS 2013 ISBN: 978-979-1421-19-5

216

scores. Both these experiments are run on the Gulf Air collection with a target summary length of 20% the length of the input data. In each experiment with different parameter configuration, the fitness value tends to increase faster in the beginning and slowly shows convergence after a half of the number of iteration. In the end, the result of this experiment is the following configuration: (i) use 0.5 as mutation probability and 1.0 as crossover probability; (ii) use fit-proportionate and best selection as the selection algorithm; (iii) use 200 individuals in a population; (iv) use 500 maximum number of generations; (v) use elitism to retain the best candidate solutions in the population; (vi) use partial random mutation; (vii) and the last use crossover operator: coin-flip removal crossover.

C. Experiments concerning Objective Functions This experiment aims to identify which

considerations should be prioritized to obtain the best possible sentence ordering. As mentioned above, the considerations used are the principle of continuity and the maintaining of original sentence relative ordering. Those considerations become the main element in the fitness function described in Equation (1). In order to reach this goal, experiments are held by combining the weight of both considerations. We use five weighting combinations represented as percentage pair. The combinations are 100%-0%, which describes the use of only the principle of continuity, 75%-25%, 50%-50%, and 25%-75%, which describes the use of both considerations with different weighting combination, and finally 100%-0%, which describes the use of only the original sentence relative ordering.

Since each weighting combination yields different raw fitness scores, we cannot simply compare the results based on the fitness scores. Thus, we must use human judgments to evaluate the results of each ordering produced by each weighting combination. Using three different cases as described in Table 2, each case is experimented using five weighting combinations. In total there are 15 orderings, and each case has five ordering variants. To compare the results of the evolutionary process in constructing the best sentence ordering with a random baseline, we also include a completely random ordering for each case. Thus there are six variants of sentence orderings which must be evaluated by each judge. Our

experiments involved twenty-one (21) judges, so each case is evaluated by seven judges.

We used an online questionnaire to gather the evaluation results from all judges, who were all either undergraduate or masters students in computer science. The evaluation is conducted on four aspects: the ease of understanding the summary, the order of information flow, the ease of concluding the summary, and the amount of entity-based coherence relationships found in each ordering. There are three statements and one question representing the aforementioned aspects, they are the content of the summary is easy to understand, there is no information gap, it is easy to conclude the content of the summary, and how many entity relationships within the summary.

We provide five options for each statement/question. For the statements, the judge could choose one of these options: Completely agree, Slightly agree, Neutral/unsure, Slightly disagree, and Completely disagree. For the last question, these options are provided: 100% related, >50% related, Neutral/unsure, <50% related, and No relation at all (0%). Each option is given a score, starting from 4 for the first option, decreasing by one in subsequent options. The score is multiplied with votes from the judges to obtain the accumulated score for each sentence ordering. The evaluation results are shown in Figs. 4 to 7, represented by total score for five variants of sentence order based on the given weighting scheme and one variant with random order.

In each evaluation criteria, the sentence ordering obtained from the usage of 75%-25% as a weighting combination for the fitness function consistently appears in first place. This means that based on the human judgment, the sentence ordering produced by that weighting combination has an opportunity to produce a good ordering of sentences from the extractive summarization result. From these results, we can also see that sentence orderings produced by the 0%-100% weighting combination, which means using only consideration of maintaining the original sentence relative ordering, also yields high scores. Except for the evaluation of ease of concluding the summary, this weighting combination is always in second place after the 75%-25% weighting combination. Conversely, if we only use the principle of continuity (100%-0%) to obtain a better ordering in the evolutionary process, the results are not viewed favorably by our human judges. As expected, the random sentence ordering is consistently ranked the worst for all criteria.

V. CONCLUSION From the experimental results and the evaluation by

human judgments, we can highlight several things. Firstly, consideration of both the principle of continuity and the original sentence relative ordering has the opportunity to produce an acceptable sentence ordering of extractive summarization result. Second, it

TABLE II CHARACTERISTICS OF DATA COLLECTION

Collection Source The number of articles in collection

The number of sentences

Gilbert Hurricane

DUC 2002 6 33

Gulf Air DUC 2002 3 97 TaufikHidayat The Jakarta

Post 3 37

ICACSIS 2013 ISBN: 978-979-1421-19-5

217

is unavoidable that especially for news documents, the original sentence relative ordering becomes a crucial consideration. It is quite hard to force the order of sentences to satisfy the continuity in each sentence pair. Although the number of entity relationships is relatively high, but the ease to understand, the flow of information, and the ease to conclude the summary gets lower scores than others. Lastly, compared to the baseline, that is the random sentence ordering, the use of both considerations, especially when using a weighting combination of 75%-25%, yields consistently better evaluation scores.

REFERENCES [1] D. Jurafsky and J. H. Martin, Speech and Language

Processing, New Jersey: Pearson Education, 2009. [2] E. Hovy and C. Y. Lin, "Automated text summarization and

the summarist system," in A workshop on TIPSTER, 1998. [3] V. Gupta and G. S. Lehal, "A Survey of Text Summarization

Extractive Techniques," Emerging Technollogies in Wb Intelligence, vol. 2, no. 3, pp. 258-268, 2010.

[4] L. Suanmali, N. Salim and M. S. Binwahlan, "Fuzzy Genetic Semantic Based Text Summarization," in Ninth International Conference on Dependable, Autonomic, and Secure Computing, Sydney, NSW, 2011.

[5] L. Suanmali, N. Salim and M. S. Binwahlan, "SRL-GSM: A Hybrid Approach based on Semantic Role Labeling and General Statistic Method for Text Summarization," Journal of Applied Sciences, vol. 10, no. 3, pp. 166-173, 2010.

[6] H. Kamyar, M. Kahani, M. Kamyar and A. Poormasoomi, "An Automatic Linguistic Approach for Persian Document Summarization," in International Conference on Asian Language Processing, Penang, 2011.

[7] A. Poormasoomi, M. Kahani, S. V. Yazdi and H. Kamyar, "Context-based persian multi-document summarization (global view)," in International Conference on Asian Language Processing, Penang, 2011.

[8] R. Barzilay, N. Elhadad and K. R. McKeown, "Inferring Strategies for Sentence Ordering in Multidocument News Summarization," Journal of Artificial Intelligence Research, vol. 17, pp. 35-55, 2003.

[9] C. Aksoy, A. Bugdayci, T. Gur, I. Uysal and F. Can, "Semantic Argument Frequency Based Multi Document Summarization," in The 24th International Symposium on Computer and Information Sciences (ISCIS), Guzelyurt, 2009.

[10] D. Trandabat, "Using semantic roles to improve summaries," in The 13th European Workshop on Natural Language Generation, Nancy, France, 2011.

[11] L. Hasler, "An investigation into the use of centering transition for summarisation," in The 7th Annual CLUK Research Colloquium, Birmingham, UK, 2004.

[12] N. Karamanis, M. Poesio, C. Mellish and J. OBerlander, "Evaluating Centering for Information Ordering Using Corpora," Computational Linguistics, vol. 35, no. 1, pp. 29-46, 2008.

[13] M. Tofiloski, "Extending centering theory for the measure of entity coherence," SIMON FRASER UNIVERSITY, 2009.

[14] N. Karamanis and R. Manurung, "Stochastic text structuring using principle of continuity," INLG, New York, 2002.

[15] R. Zhang, W. Li and Q. Liu, "Sentence ordering with event-enriched semantics and two layered clustering for multi document news summarization," in Coling 2010 (Poster Volume), 2010.

Fig. 4. The Ease of Understanding The Summary

Fig. 5. The Order of Information Flow

Fig. 6. The Ease of Concluding The Summary

Fig. 7. The Number of Entity Relationships

36

4943 41 46

29

0102030405060

Tota

l Sco

re

Variant of Sentence Order

21

3428 25

30

20

0

10

20

30

40

Tota

l Sco

re


3644 41 43 38

29

01020304050

Tota

l Sco

re


3949

36 3241

26

0102030405060

Tota

l Sco

re


ICACSIS 2013 ISBN: 978-979-1421-19-5

218

Documents

[IEEE 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS) - Sanur Bali, Indonesia (2013.09.28-2013.09.29)] 2013 International Conference on