MoViNet-A2 for Vietnamese sign language recognition

  • Trương Duy Việt
  • Ngô Hữu Gia Huy
  • Phạm Đăng Khôi
  • Nguyễn Trần Thiên Phúc
Keywords: deep learning; action recognition; sign language recognition; MoviNet-A2; data augmentation

Abstract

Sign language recognition from video is an essential task to support communication for the hearing-impaired community. However, the diversity of gestures, different camera angles, and varying environmental conditions pose significant challenges for traditional recognition systems. In this study, we propose a Vietnamese sign language recognition method based on MoViNet-A2, an advanced model optimized for action recognition in videos on mobile devices. The research dataset consists of 98 words or phrases, performed by 18 students from Lam Dong - Da Lat School for the Disabled, with a total of 4,709 videos from three different camera angles, ensuring diversity in training data. MoViNet-A2 serves as the backbone, pre-trained on the Kinetics-600 dataset. It is combined with preprocessing techniques such as class balancing, brightness normalization, and data augmentation to improve model generalization. Our method achieves a Top-1 Accuracy of 88.55%. Experimental results demonstrate that the proposed method achieves high performance in classifying and recognizing sign language gestures while ensuring real-time processing capabilities on mobile devices. This research not only improves the accuracy of sign language recognition systems but also opens up practical applications in facilitating communication for the hearing-impaired community.

Tác giả

Trương Duy Việt
Dalat College, Dalat City
Ngô Hữu Gia Huy
Dalat College, Dalat City
Phạm Đăng Khôi
Dalat College, Dalat City
Nguyễn Trần Thiên Phúc
Dalat College, Dalat City
điểm /   đánh giá
Published
2025-07-20
Section
Bài viết