On redundancy in multi-document summarization

Hiram Calvo; Pabel Carrillo-Mendoza; Alexander Gelbukh

doi:10.3233/JIFS-169507

On redundancy in multi-document summarization

Hiram Calvo, Pabel Carrillo-Mendoza, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.

Original language	English
Pages (from-to)	3245-3255
Number of pages	11
Journal	Journal of Intelligent and Fuzzy Systems
Volume	34
Issue number	5
DOIs	https://doi.org/10.3233/JIFS-169507
State	Published - 2018

Keywords

Doc2vec
Multi-document summarization
Sentence redundancy
Similarity graphs
Unsupervised summarization

Access to Document

10.3233/JIFS-169507

Cite this

@article{1bd051349aa44db682b04281bfda24db,

title = "On redundancy in multi-document summarization",

abstract = "In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.",

keywords = "Doc2vec, Multi-document summarization, Sentence redundancy, Similarity graphs, Unsupervised summarization",

author = "Hiram Calvo and Pabel Carrillo-Mendoza and Alexander Gelbukh",

year = "2018",

doi = "10.3233/JIFS-169507",

language = "Ingl{\'e}s",

volume = "34",

pages = "3245--3255",

journal = "Journal of Intelligent and Fuzzy Systems",

issn = "1064-1246",

number = "5",

}

TY - JOUR

T1 - On redundancy in multi-document summarization

AU - Calvo, Hiram

AU - Carrillo-Mendoza, Pabel

AU - Gelbukh, Alexander

PY - 2018

Y1 - 2018

N2 - In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.

AB - In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy.We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.

KW - Doc2vec

KW - Multi-document summarization

KW - Sentence redundancy

KW - Similarity graphs

KW - Unsupervised summarization

UR - http://www.scopus.com/inward/record.url?scp=85063474088&partnerID=8YFLogxK

U2 - 10.3233/JIFS-169507

DO - 10.3233/JIFS-169507

M3 - Artículo

SN - 1064-1246

VL - 34

SP - 3245

EP - 3255

JO - Journal of Intelligent and Fuzzy Systems

JF - Journal of Intelligent and Fuzzy Systems

IS - 5

ER -

On redundancy in multi-document summarization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this