Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.

Original languageEnglish
Article number1201
JournalApplied Sciences (Switzerland)
Volume13
Issue number2
DOIs
StatePublished - Jan 2023

Keywords

  • English–Wolaytta NMT
  • Wolaytta–English NMT
  • low-resource NMT
  • monolingual data for low-resource languages
  • neural machine translation
  • self-learning

Fingerprint

Dive into the research topics of 'Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data'. Together they form a unique fingerprint.

Cite this