TY - JOUR
T1 - Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data
AU - Tonja, Atnafu Lambebo
AU - Kolesnikova, Olga
AU - Gelbukh, Alexander
AU - Sidorov, Grigori
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/1
Y1 - 2023/1
N2 - Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.
AB - Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta–English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta–English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta–English and English–Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.
KW - English–Wolaytta NMT
KW - Wolaytta–English NMT
KW - low-resource NMT
KW - monolingual data for low-resource languages
KW - neural machine translation
KW - self-learning
UR - http://www.scopus.com/inward/record.url?scp=85146667921&partnerID=8YFLogxK
U2 - 10.3390/app13021201
DO - 10.3390/app13021201
M3 - Artículo
AN - SCOPUS:85146667921
SN - 2076-3417
VL - 13
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 2
M1 - 1201
ER -