TY - JOUR
T1 - Reaching for upper bound ROUGE score of extractive summarization methods
AU - Akhmetov, Iskander
AU - Mussabayev, Rustam
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© Copyright 2022 Akhmetov et al.
PY - 2022
Y1 - 2022
N2 - The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.
AB - The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.
KW - Genetic algorithm
KW - Greedy algorithm
KW - Rouge
KW - Text summarization
KW - Variable neighborhood search
UR - http://www.scopus.com/inward/record.url?scp=85140586779&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.1103
DO - 10.7717/peerj-cs.1103
M3 - Artículo
C2 - 36262160
AN - SCOPUS:85140586779
SN - 2376-5992
VL - 8
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e1103
ER -