On redundancy in multi-document summarization

Hiram Calvo, Pabel Carrillo-Mendoza, Alexander Gelbukh

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.

Original languageEnglish
Pages (from-to)3245-3255
Number of pages11
JournalJournal of Intelligent and Fuzzy Systems
Volume34
Issue number5
DOIs
StatePublished - 2018

Keywords

  • Doc2vec
  • Multi-document summarization
  • Sentence redundancy
  • Similarity graphs
  • Unsupervised summarization

Fingerprint

Dive into the research topics of 'On redundancy in multi-document summarization'. Together they form a unique fingerprint.

Cite this