Acronym Identification using Transformers and Flair Framework

F. Balouchzahi, O. Vitman, H. L. Shashirekha, G. Sidorov, A. Gelbukh

Research output: Contribution to journalConference articlepeer-review

Abstract

The amount of acronyms in texts is growing with the increase in the number of scientific articles and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describes the participation of the team MUCIC in the AE shared task at the AAAI-22 Workshop on Scientific Document Understanding (SDU@AAAI-22). This shared task aims at identifying and extracting acronyms and their long forms from English, Spanish, French, Danish, Persian, and Vietnamese texts. The proposed methodology consists of data transformation using Spacy and/or other libraries depending on the language and a Flair framework to fine-tune the transformers of the corresponding languages to extract acronyms and their long-forms. For the Spanish language, the proposed methodology secured the second rank and for all other languages, the results obtained are reasonable.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3164
StatePublished - 2022
Event2022 Workshop on Scientific Document Understanding, SDU 2022 - Virtual, Online
Duration: 1 Mar 2022 → …

Keywords

  • Acronym
  • BERT
  • Expansion
  • Flair

Cite this