IMPROVING PERFORMANCE OF VIETNAMESE SPEAKER RECOGNITION USING TRANSFER LEARNING AND ENSEMBLE EMBEDDING

Tan Hoang Ho; Cao Truong Tran

Tan Hoang Ho Institute of Information and Communication Technology, Le Quy Don Technical University
Cao Truong Tran Institute of Information and Communication Technology, Le Quy Don Technical University

Keywords: Speaker recognition, speaker verification, speaker identification, transfer learning, representation learning

Abstract

Speaker recognition technology is crucial for identifying or verifying individuals based on their unique vocal characteristics, such as pitch, tone, and speaking style. This technology is widely used to enhance security, improve customer service, support law enforcement, and personalize interactions with smart devices. In recent years, thanks to the application of deep learning techniques, speaker recognition has made significant progress. However, Vietnamese speaker recognition still faces many challenges. This paper presents new strategies that combine transfer learning and ensemble learning to improve the accuracy of Vietnamese speaker recognition. Experimental results on Vietnamese datasets show significant improvements in recognition accuracy. These findings highlight the potential of tailored approaches to advance speaker recognition technology for Vietnamese speakers and expand its applications in this field.

IMPROVING PERFORMANCE OF VIETNAMESE SPEAKER RECOGNITION USING TRANSFER LEARNING AND ENSEMBLE EMBEDDING

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS