21
哈哈哈哈哈哈哈哈哈哈 哈哈哈哈哈哈哈哈哈哈 HITIR’s Update Summary at T AC2008 Extractive Content Selectio n Using Evolutionary Manifo ld-ranking and Spectral Clu stering Reporter: Ph.d candidate He Ruifang [email protected] Information Retrieval Lab School of Computer Scie nce and Technology Harbin Institute of Tec hnology, Harbin, China

Reporter: Ph.d candidate He Ruifang [email protected] Information Retrieval Lab

  • Upload
    liko

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering. Reporter: Ph.d candidate He Ruifang [email protected] Information Retrieval Lab School of Computer Science and Technology - PowerPoint PPT Presentation

Citation preview

Page 1: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

HITIR’s Update Summary at TAC2008Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Cl

ustering

Reporter: Ph.d candidate He [email protected]

Information Retrieval Lab School of Computer Science and Technology Harbin Institute of Technology, Harbin, China

Page 2: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evaluation rank

Three top 1 in PYRAMID average modified(pyramid) score average numSCUs macro-average modified score with 3 models of P

YRAMID 13th in ROUGE-2 15th in ROUGE-SU4 17th in BE

Page 3: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Update summary introduction Aims to capture evolving information of a single

topic changing over time Temporal data can be considered to be composed

of many time slices

t1 t2 t3

topic

Page 4: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Question analysis

from view of data First difference: data has the temporal evolution

characteristic Deal with dynamic document collection of a single

topic in continuous periods of time from view of users

Second difference: user needs have evolution characteristic

Hope to incrementally care the important and novel information relevant to a topic

Page 5: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Challenges for update summary(extractive or generative) Content selection

Importance Redundancy Content coverage

Language quality Coherence Fluency

Just focus on the extractive content selection How to model the importance and the redundancy of topic

relevance and the content converge under the evolving data and user needs?

Page 6: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evolutionary manifold-ranking

Spectral clustering

Combine evolutionary manifold-ranking with spectral clustering to improve the coverage of content selection!

Topic relevance

Content coverage

Temporal evolution

Explore the new manifold-ranking framework under the context of temporal data points!

New challenges

Page 7: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evolutionary manifold-ranking

Manifold-ranking ranks the data points under the intrinsic global manifold structure by their relevance to the query

Difficulty: not model the temporally evolving characteristic, as the query is static !

Assumption of our idea Data points evolving over time have the long and narrow manifold

structure

Page 8: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Motivation of our idea Relay point of information propagation

Dynamic evolution of query Relay propagation of information

Iterative feedback mechanism in evolutionary manifold-ranking

The summary sentences from previous time slices The first sentences of documents in current time slice

Relay point of information propagation

Page 9: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Manifold-ranking: Notation

n sentencesdata points t querylabel One Affinity Matrix for data points:

W: original similarity matrix D: diagonal matrix S: normalized matrix

Labeling Matrix: Vectorial Function (ranking): Learning task:

( , )W D S,

1[ , , ]T T TnF F F

1[ , , ]T T TnY Y Y

{( , , ); }W D S Y F

n n

n n

Page 10: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Regularization framework

2 2

, 1 1

1 1 1( ) ( )

2

n n

ij i j i ii j iii jj

Q F W F F F YD D

* arg min ( )F

F Q F

Iterative form

Closed form

( 1) ( ) (1 )F t SF t Y

* 1( )F I S Y

Fitting constraint

Smoothness constraint

Page 11: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evolutionary manifold-ranking framework

New iterative form Closed form

1 11 1[ , , ]T T TnY Y Y

* 1

1 2 3( ) ( )F I S Y Y Y

Labeling Matrix:

the original query the summary sentences from previous time slices the first sentences of documents in current time slices

2 21 2[ , , ]T T TnY Y Y

3 31 3[ , , ]T T TnY Y Y

1 2 3( 1) ( ) ( )F t + SF t Y Y Y

Iterative feedback mechanism

Page 12: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evolutionary manifold-ranking

Spectral clustering

Topic relevance

Content coverage

Temporal evolution

New challenges

Topic relevance

Temporal evolution

Page 13: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Normalized Spectral clustering

Why choose the spectral clustering? Automatically determine the number of clusters Cluster the data points with arbitrary shape Converge to the globally optimal solution

Center object of spectral clustering Graph Laplacian transformation Select normalized random walk Laplacian

Have good convergence

Page 14: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Basic idea of spectral clustering

Good property the number of clusters is determined by the multipl

icity of the eigenvalue 0 of normalized random walk Laplacian matrix

Post processing the properties of eigenvector K-means

Page 15: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Sentence selection

no sub-topics a greedy algorithm

Sort sub-topics

Extract sentences

Penalize similarsentences

loop

Page 16: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

System design schemes

System No.\Priority Spectral clustering (post-processing) Properties of eigenvector k-means

Evolutionarymanifold-ranking

11(1) 41(2) 62(3) 

Page 17: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

System overviewInput

Sentence Splitter

Similarity Graph Similarity Graph

Spectral Clustering

Evolutionary Manifold-ranking

Order sub-topicsSelect Sentences

Output Summary

Threshold= Threshold=01

Page 18: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Evaluation rank

three top 1 average modified(pyramid) score average numSCUs macro-average modified score with 3 models of PYRAMID

13th in ROUGE-2 15th in ROUGE-SU4 17th in BE

Page 19: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Personal viewpoint

ROUGE and BE content selection of generative summary Relatively short SCU

PYRAMID content selection of extractive summary Long SCU

Hope: extend the number of time slices of evolving data

Page 20: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Conclusion Use normalized spectral clustering and evolutionary

manifold-ranking to model the new characteristics of update summary

Develop the extractive content selection method for language independence

Future work Develop high level models Better optimization method of parameters

Common topic Further explore the appropriate evaluation method for

update summary

Page 21: Reporter: Ph.d candidate He Ruifang rfhe@ir.hit      Information Retrieval Lab

哈工大信息检索研究室哈工大信息检索研究室

Thank you!

Any question?