Reaching for upper bound ROUGE score of extractive summarization methods

Iskander Akhmetov; Rustam Mussabayev; Alexander Gelbukh

doi:10.7717/peerj-cs.1103

Reaching for upper bound ROUGE score of extractive summarization methods

Iskander Akhmetov, Rustam Mussabayev, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

2 Citas (Scopus)

Resumen

The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

Idioma original	Inglés
Número de artículo	e1103
Publicación	PeerJ Computer Science
Volumen	8
DOI	https://doi.org/10.7717/peerj-cs.1103
Estado	Publicada - 2022

Acceder al documento

10.7717/peerj-cs.1103

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{60074293a4ea4eccadcd2d6a2e09a23f,

title = "Reaching for upper bound ROUGE score of extractive summarization methods",

abstract = "The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.",

keywords = "Genetic algorithm, Greedy algorithm, Rouge, Text summarization, Variable neighborhood search",

author = "Iskander Akhmetov and Rustam Mussabayev and Alexander Gelbukh",

year = "2022",

doi = "10.7717/peerj-cs.1103",

language = "Ingl{\'e}s",

volume = "8",

journal = "PeerJ Computer Science",

issn = "2376-5992",

publisher = "PeerJ Inc.",

}

TY - JOUR

T1 - Reaching for upper bound ROUGE score of extractive summarization methods

AU - Akhmetov, Iskander

AU - Mussabayev, Rustam

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

AB - The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.

KW - Genetic algorithm

KW - Greedy algorithm

KW - Rouge

KW - Text summarization

KW - Variable neighborhood search

UR - http://www.scopus.com/inward/record.url?scp=85140586779&partnerID=8YFLogxK

U2 - 10.7717/peerj-cs.1103

DO - 10.7717/peerj-cs.1103

M3 - Artículo

C2 - 36262160

AN - SCOPUS:85140586779

SN - 2376-5992

VL - 8

JO - PeerJ Computer Science

JF - PeerJ Computer Science

M1 - e1103

ER -

Reaching for upper bound ROUGE score of extractive summarization methods

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto