ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules

Antonio Tamayo, Diego Burgos, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

Abstract

Named entity recognition (NER) and normalization are crucial tasks for information extraction in the medical field. They have been tackled through different approaches from rule-based systems and classic machine learning methods with feature engineering to the most sophisticated deep learning models; most of them for English. In this work, we present a transfer learning approach starting from multilingual BERT to tackle the problem of Spanish NER (species) and normalization in clinical cases by using sentence tokenization for training and a paragraph tuning strategy at the inference phase. We propose that text lengths at training and inference stages do not have to match and that such difference can leverage the model's performance according to the task. Our validation showed that using a context of three sentences during inference improves the F1 score in ≈1% compared to longer and shorter paragraphs and in ≈17% compared to the whole document. We also applied simple but effective post-processing rules on the model's output, which improved the Micro F1 score in ≈28%. Our system achieved an F1 of 0.8499 in the testing dataset of the LivingNER shared task 2022.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3202
StatePublished - 2022
Event2022 Iberian Languages Evaluation Forum, IberLEF 2022 - A Coruna, Spain
Duration: 20 Sep 2022 → …

Keywords

  • Named entity recognition
  • multilingual BERT
  • normalization
  • paragraph tuning
  • transfer learning

Fingerprint

Dive into the research topics of 'ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules'. Together they form a unique fingerprint.

Cite this