A RESEARCH ON VIETNAMESE-K’HO LANGUAGE TRANSLATION SYSTEM USING NEURAL MACHINE TRANSLATION
Abstract
The K'Ho language is used by the K'Ho ethnic group, who live in the South Central Highlands, especially the districts of Don Duong, Duc Trong, Di Linh, Da Huoai, and Lac Duong in Lam Dong province. Currently, the provincial People's Committee and the Ethnic Minority Committee of Lam Dong province are encouraging cadres and officials in the province to learn the K'Ho language to contact and propagate the guidelines, lines, policies, and laws of the Party and government to the K'Ho people. In this paper, we utilize the K'Ho language resources and support from many K'Ho language experts to build a Vietnamese - K'Ho bilingual corpus to contribute the promotion and preservation of the K'Ho language. The corpus includes more than 16,000 Vietnamese-K'Ho bilingual sentence pairs, which are not easy to collect due to the limitation of K'Ho language resource. Moreover, we use the OpenNMT framework to build an automatic translation system based on the collected bilingual data. The result can reach to an accuracy of 56.54%, which is an acceptable result in the automatic translation field.