Acronym Identification using Transformers and Flair Framework

F. Balouchzahi; O. Vitman; H. L. Shashirekha; G. Sidorov; A. Gelbukh

Acronym Identification using Transformers and Flair Framework

F. Balouchzahi, O. Vitman, H. L. Shashirekha, G. Sidorov, A. Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

Abstract

The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.

Original language	English
Journal	CEUR Workshop Proceedings
Volume	3164
State	Published - 2022
Event	2022 Workshop on Scientific Document Understanding, SDU 2022 - Virtual, Online Duration: 1 Mar 2022 → …

Keywords

Acronym
BERT
Expansion
Flair

Cite this

@article{3cdea4df0f554cddbe82b1e36fbc259d,

title = "Acronym Identification using Transformers and Flair Framework",

abstract = "The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.",

keywords = "Acronym, BERT, Expansion, Flair",

author = "F. Balouchzahi and O. Vitman and Shashirekha, {H. L.} and G. Sidorov and A. Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2021 Copyright for this paper by its authors.; 2022 Workshop on Scientific Document Understanding, SDU 2022 ; Conference date: 01-03-2022",

year = "2022",

language = "Ingl{\'e}s",

volume = "3164",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Acronym Identification using Transformers and Flair Framework

AU - Balouchzahi, F.

AU - Vitman, O.

AU - Shashirekha, H. L.

AU - Sidorov, G.

AU - Gelbukh, A.

PY - 2022

Y1 - 2022

N2 - The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.

AB - The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.

KW - Acronym

KW - BERT

KW - Expansion

KW - Flair

UR - http://www.scopus.com/inward/record.url?scp=85134532746&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85134532746

SN - 1613-0073

VL - 3164

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 2022 Workshop on Scientific Document Understanding, SDU 2022

Y2 - 1 March 2022

ER -

Acronym Identification using Transformers and Flair Framework

Abstract

Keywords

Other files and links

Cite this