ChatCVHT: An academic advising chatbot based on semantic retrieval, with topic routing and confidence threshold calibration

Nguyen Trong Hien, Phung Quang Vinh, Le Thien Khiem, Dang Bao Dang, Nguyen Minh Tuan

Nguyen Trong Hien, Phung Quang Vinh, Le Thien Khiem, Dang Bao Dang, Nguyen Minh Tuan

Từ khóa: Chatbot; academic advising; semantic retrieval; sentence embedding; E5; topic classification; FAISS; confidence-threshold calibration; CLARIFY.

Tóm tắt

Academic advisors in higher education must respond to a high volume of student questions about regulations, training policies, and study planning, many of which share the same intent but differ in wording. While chatbots can scale academic support, a key risk is providing incorrect answers when queries are ambiguous or weakly supported by available evidence. We developed ChatCVHT, an academic advising chatbot that combines semantic retrieval with topic-based routing using a knowledge base of 748 question–answer pairs across eight topics. The system separates topic classification from document retrieval and introduces a confidence-based decision layer that jointly considers similarity scores, the score margin between top candidates, and predicted topic confidence to decide whether to answer or request clarification. In our experiments, multilingual-e5-small achieved stable retrieval performance (Recall@10 = 0.9782; MRR@10 = 0.8841), and multilingual-e5-small with a Logistic Regression classifier (L2 regularization, C = 3) reached Macro-F1 = 0.982 and Accuracy = 0.9853 for topic classification (5-fold cross-validation). When integrated end-to-end, the decision layer withheld responses for ~7% of queries to prioritize clarification under uncertainty, while maintaining Recall@10 = 0.916 and MRR@10 = 0.8418. Overall, ChatCVHT adopts a conservative strategy that balances coverage and reliability and supports safer deployment of academic advising chatbots where factual accuracy is critical.

DOI: 10.59715/pntjmp.5.2.12

ChatCVHT: An academic advising chatbot based on semantic retrieval, with topic routing and confidence threshold calibration

Tóm tắt

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS