Greedy Optimization Method for Extractive Summarization of Scientific Articles

Iskander Akhmetov; Alexander Gelbukh; Rustam Mussabayev

doi:10.1109/ACCESS.2021.3136302

Greedy Optimization Method for Extractive Summarization of Scientific Articles

Iskander Akhmetov, Alexander Gelbukh, Rustam Mussabayev

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

4 Citas (Scopus)

Resumen

This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

Idioma original	Inglés
Páginas (desde-hasta)	168141-168153
Número de páginas	13
Publicación	IEEE Access
Volumen	9
DOI	https://doi.org/10.1109/ACCESS.2021.3136302
Estado	Publicada - 2021

Acceder al documento

10.1109/ACCESS.2021.3136302

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{b0cbfcaf56244e5b88a2a5c6807b56b9,

title = "Greedy Optimization Method for Extractive Summarization of Scientific Articles",

abstract = "This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.",

keywords = "Extractive text summarization, greedy algorithm, variable neighborhood search",

author = "Iskander Akhmetov and Alexander Gelbukh and Rustam Mussabayev",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3136302",

language = "Ingl{\'e}s",

volume = "9",

pages = "168141--168153",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Greedy Optimization Method for Extractive Summarization of Scientific Articles

AU - Akhmetov, Iskander

AU - Gelbukh, Alexander

AU - Mussabayev, Rustam

PY - 2021

Y1 - 2021

N2 - This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

AB - This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

KW - Extractive text summarization

KW - greedy algorithm

KW - variable neighborhood search

UR - http://www.scopus.com/inward/record.url?scp=85121818683&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3136302

DO - 10.1109/ACCESS.2021.3136302

M3 - Artículo

AN - SCOPUS:85121818683

SN - 2169-3536

VL - 9

SP - 168141

EP - 168153

JO - IEEE Access

JF - IEEE Access

ER -

Greedy Optimization Method for Extractive Summarization of Scientific Articles

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto