MoViNet-A2 for Vietnamese sign language recognition
Trương Duy Việt
Ngô Hữu Gia Huy
Phạm Đăng Khôi
Nguyễn Trần Thiên Phúc
Keywords:
deep learning; action recognition; sign language recognition; MoviNet-A2; data augmentation
Abstract
Sign language recognition from video is an essential task to support communication for the hearing-impaired community. However, the diversity of gestures, different camera angles, and varying environmental conditions pose significant challenges for traditional recognition systems. In this study, we propose a Vietnamese sign language recognition method based on MoViNet-A2, an advanced model optimized for action recognition in videos on mobile devices. The research dataset consists of 98 words or phrases, performed by 18 students from Lam Dong - Da Lat School for the Disabled, with a total of 4,709 videos from three different camera angles, ensuring diversity in training data. MoViNet-A2 serves as the backbone, pre-trained on the Kinetics-600 dataset. It is combined with preprocessing techniques such as class balancing, brightness normalization, and data augmentation to improve model generalization. Our method achieves a Top-1 Accuracy of 88.55%. Experimental results demonstrate that the proposed method achieves high performance in classifying and recognizing sign language gestures while ensuring real-time processing capabilities on mobile devices. This research not only improves the accuracy of sign language recognition systems but also opens up practical applications in facilitating communication for the hearing-impaired community.