Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

11 Citas (Scopus)

Resumen

Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.

Idioma originalInglés
Número de artículo1201
PublicaciónApplied Sciences (Switzerland)
Volumen13
N.º2
DOI
EstadoPublicada - ene. 2023

Huella

Profundice en los temas de investigación de 'Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data'. En conjunto forman una huella única.

Citar esto