CODEBOT –A VIETNAMESE CHATBOT SYSTEM FOR ANSWERING C++ AND PYTHON RELATED QUESTIONS

  • Vuong Le Minh Nguyen
  • Luong Cong Tam
  • Nguyen Viet Hung
  • Nguyen Do Thai Nguyen
  • Luong Tran Hy Hien
  • Luong Tran Ngoc Khiet
  • Phan Thi Trinh
Từ khóa: programming teaching; Vietnamese educational chatbot; Vietnamese natural language processing; Vietnamese question answering system

Tóm tắt

During the fourth industrial revolution, the ability of programming is one of the most essential skills for the youth to earn an edge over the competitors in their specializations. Programming techniques are not only important to software development but also useful for statistical analytics and mathematical modelling in other fields of study. However, fundamental programming materials on the Internet are mostly written in English instead of Vietnamese, which set a distance between these materials and Vietnamese youth. This led to the idea of having a simple yet effective Vietnamese question answering chatbot to engage and motivate Vietnamese students to climb the steep learning curve of programming. This paper, combining natural language processing with knowledge representation and reasoning, aimed to implement such a question answering chatbot in pure Vietnamese to help students with their programming related questions. A simple knowledge representation method was introduced to integrate external knowledge to the system. A knowledge reasoning and retrieval-based question answering method was also proposed to effectively yield proper responses from user’s queries. The range of topics the chatbot supports is limited to C++ and Python, two of the most taught programming languages in Vietnamese colleges and universities. At the heart of our chatbot, two machine learning models were designed to classify user’s intents. They were trained and evaluated on our annotated dataset, which were contributed by students from Faculty of Information Technology, Ho Chi Minh University of Education. Our proposed models achieved surprisingly high F1-scores of 0.96 and 0.99 on our evaluation dataset.

điểm /   đánh giá
Phát hành ngày
2022-05-16