COMPARISON OF MACHINE LEARNING ALGORITHMS FOR SENTIMENT ANALYSIS OF VIETNAMESE YOUTUBE SUBTITLES
Abstract
Currently, YouTube has become one of the most significant online platforms, with billions of hours of video uploaded every day, attracting a vast user base. Recently, foreign reactionary forces and extremist organizations have exploited YouTube to disseminate videos undermining the Party, the State, and the Vietnamese military. This study focuses on analyzing Vietnamese subtitles collected from YouTube. By using machine learning algorithms, it conducts sentiment analysis and categorizes the subtitles of videos. This research provides a profound insight into the emotions and perspectives of the online community regarding content on YouTube, particularly those related to politics and society. The results of the study among four machine learning algorithms include Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression. Among them, the Random Forest algorithm has achieved the highest accuracy rate of 81%, surpassing the other three algorithms in analyzing the sentiments of subtitles from YouTube videos with negative content.