Greedy Optimization Method for Extractive Summarization of Scientific Articles

Iskander Akhmetov; Alexander Gelbukh; Rustam Mussabayev

doi:10.1109/ACCESS.2021.3136302

Greedy Optimization Method for Extractive Summarization of Scientific Articles

Iskander Akhmetov, Alexander Gelbukh, Rustam Mussabayev

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

Original language	English
Pages (from-to)	168141-168153
Number of pages	13
Journal	IEEE Access
Volume	9
DOIs	https://doi.org/10.1109/ACCESS.2021.3136302
State	Published - 2021

Keywords

Extractive text summarization
greedy algorithm
variable neighborhood search

Access to Document

10.1109/ACCESS.2021.3136302

Cite this

@article{b0cbfcaf56244e5b88a2a5c6807b56b9,

title = "Greedy Optimization Method for Extractive Summarization of Scientific Articles",

abstract = "This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.",

keywords = "Extractive text summarization, greedy algorithm, variable neighborhood search",

author = "Iskander Akhmetov and Alexander Gelbukh and Rustam Mussabayev",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3136302",

language = "Ingl{\'e}s",

volume = "9",

pages = "168141--168153",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Greedy Optimization Method for Extractive Summarization of Scientific Articles

AU - Akhmetov, Iskander

AU - Gelbukh, Alexander

AU - Mussabayev, Rustam

PY - 2021

Y1 - 2021

N2 - This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

AB - This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

KW - Extractive text summarization

KW - greedy algorithm

KW - variable neighborhood search

UR - http://www.scopus.com/inward/record.url?scp=85121818683&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3136302

DO - 10.1109/ACCESS.2021.3136302

M3 - Artículo

AN - SCOPUS:85121818683

SN - 2169-3536

VL - 9

SP - 168141

EP - 168153

JO - IEEE Access

JF - IEEE Access

ER -

Greedy Optimization Method for Extractive Summarization of Scientific Articles

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this