mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish

Antonio Tamayo; Diego A. Burgos; Alexander Gelbukh

mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish

Antonio Tamayo, Diego A. Burgos, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

1 Cita (Scopus)

Resumen

Automatic disease mention extraction is a relevant task due to its various applications in the medical field. During the last decade, many related works have been published, which have accelerated the progress of this research area, but most of them have been carried out in English. In this work, we propose a deep-learning baseline for this task in Spanish. We report an approach based on transfer learning using multilingual BERT and a straightforward post-processing to tackle the problem. Our system does not use any external resources and rely only on efficient fine tuning, which makes it a fair baseline (Micro F1 = 0.5456) for disease mention identification in Spanish using transformer-based models.

Idioma original	Inglés
Páginas (desde-hasta)	350-356
Número de páginas	7
Publicación	CEUR Workshop Proceedings
Volumen	3180
Estado	Publicada - 2022
Evento	2022 Conference and Labs of the Evaluation Forum, CLEF 2022 - Bologna, Italia Duración: 5 sep. 2022 → 8 sep. 2022

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{7d989aadfb5f45c2b100a44daeece9c6,

title = "mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish",

abstract = "Automatic disease mention extraction is a relevant task due to its various applications in the medical field. During the last decade, many related works have been published, which have accelerated the progress of this research area, but most of them have been carried out in English. In this work, we propose a deep-learning baseline for this task in Spanish. We report an approach based on transfer learning using multilingual BERT and a straightforward post-processing to tackle the problem. Our system does not use any external resources and rely only on efficient fine tuning, which makes it a fair baseline (Micro F1 = 0.5456) for disease mention identification in Spanish using transformer-based models.",

keywords = "Disease mention detection, multilingual BERT, named entity recognition (NER)",

author = "Antonio Tamayo and Burgos, {Diego A.} and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its authors.; 2022 Conference and Labs of the Evaluation Forum, CLEF 2022 ; Conference date: 05-09-2022 Through 08-09-2022",

year = "2022",

language = "Ingl{\'e}s",

volume = "3180",

pages = "350--356",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - mBERT and Simple Post-Processing

T2 - 2022 Conference and Labs of the Evaluation Forum, CLEF 2022

AU - Tamayo, Antonio

AU - Burgos, Diego A.

AU - Gelbukh, Alexander

PY - 2022

Y1 - 2022

N2 - Automatic disease mention extraction is a relevant task due to its various applications in the medical field. During the last decade, many related works have been published, which have accelerated the progress of this research area, but most of them have been carried out in English. In this work, we propose a deep-learning baseline for this task in Spanish. We report an approach based on transfer learning using multilingual BERT and a straightforward post-processing to tackle the problem. Our system does not use any external resources and rely only on efficient fine tuning, which makes it a fair baseline (Micro F1 = 0.5456) for disease mention identification in Spanish using transformer-based models.

AB - Automatic disease mention extraction is a relevant task due to its various applications in the medical field. During the last decade, many related works have been published, which have accelerated the progress of this research area, but most of them have been carried out in English. In this work, we propose a deep-learning baseline for this task in Spanish. We report an approach based on transfer learning using multilingual BERT and a straightforward post-processing to tackle the problem. Our system does not use any external resources and rely only on efficient fine tuning, which makes it a fair baseline (Micro F1 = 0.5456) for disease mention identification in Spanish using transformer-based models.

KW - Disease mention detection

KW - multilingual BERT

KW - named entity recognition (NER)

UR - http://www.scopus.com/inward/record.url?scp=85136997089&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85136997089

SN - 1613-0073

VL - 3180

SP - 350

EP - 356

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 5 September 2022 through 8 September 2022

ER -

mBERT and Simple Post-Processing: A Baseline for Disease Mention Detection in Spanish

Resumen

Otros archivos y enlaces

Huella

Citar esto