mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish

Antonio Tamayo, Diego A. Burgos, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Automatic disease mention extraction is a relevant task due to its various applications in the medical field. During the last decade, many related works have been published, which have accelerated the progress of this research area, but most of them have been carried out in English. In this work, we propose a deep-learning baseline for this task in Spanish. We report an approach based on transfer learning using multilingual BERT and a straightforward post-processing to tackle the problem. Our system does not use any external resources and rely only on efficient fine tuning, which makes it a fair baseline (Micro F1 = 0.5456) for disease mention identification in Spanish using transformer-based models.

Original languageEnglish
Pages (from-to)350-356
Number of pages7
JournalCEUR Workshop Proceedings
Volume3180
StatePublished - 2022
Event2022 Conference and Labs of the Evaluation Forum, CLEF 2022 - Bologna, Italy
Duration: 5 Sep 20228 Sep 2022

Keywords

  • Disease mention detection
  • multilingual BERT
  • named entity recognition (NER)

Fingerprint

Dive into the research topics of 'mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish'. Together they form a unique fingerprint.

Cite this