ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules

Antonio Tamayo, Diego Burgos, Alexander Gelbukh

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

Resumen

Named entity recognition (NER) and normalization are crucial tasks for information extraction in the medical field. They have been tackled through different approaches from rule-based systems and classic machine learning methods with feature engineering to the most sophisticated deep learning models; most of them for English. In this work, we present a transfer learning approach starting from multilingual BERT to tackle the problem of Spanish NER (species) and normalization in clinical cases by using sentence tokenization for training and a paragraph tuning strategy at the inference phase. We propose that text lengths at training and inference stages do not have to match and that such difference can leverage the model's performance according to the task. Our validation showed that using a context of three sentences during inference improves the F1 score in ≈1% compared to longer and shorter paragraphs and in ≈17% compared to the whole document. We also applied simple but effective post-processing rules on the model's output, which improved the Micro F1 score in ≈28%. Our system achieved an F1 of 0.8499 in the testing dataset of the LivingNER shared task 2022.

Idioma originalInglés
PublicaciónCEUR Workshop Proceedings
Volumen3202
EstadoPublicada - 2022
Evento2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Espana
Duración: 20 sep. 2022 → …

Huella

Profundice en los temas de investigación de 'ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules'. En conjunto forman una huella única.

Citar esto