On redundancy in multi-document summarization

Hiram Calvo, Pabel Carrillo-Mendoza, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

5 Citas (Scopus)

Resumen

In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.

Idioma originalInglés
Páginas (desde-hasta)3245-3255
Número de páginas11
PublicaciónJournal of Intelligent and Fuzzy Systems
Volumen34
N.º5
DOI
EstadoPublicada - 2018

Huella

Profundice en los temas de investigación de 'On redundancy in multi-document summarization'. En conjunto forman una huella única.

Citar esto