RECOGNIZING AND UNDERSTANDING AMERICAN SIGN LANGUAGE USING DEEP LEARNING AND MEDIAPIPE
DOI: 10.18173/2354-1059.2026-0006
Abstract
Sign language serves as the primary medium of expression serves as the primary medium of expression for deaf and hard-of-hearing individuals. However, interpersonal interaction remains challenging, as they primarily rely on sign language to express their thoughts. To address this issue, this study proposes an automated sign language recognition and interpretation framework, integrating static and dynamic recognition components. Specifically, a Convolutional Neural Network (CNN) is employed for static gesture classification, while a hybrid CNN- Long Short-Term Memory (CNN-LSTM) architecture is utilized to capture the spatiotemporal features of dynamic signs. Furthermore, MediaPipe is leveraged for robust landmark localization to enhance feature extraction. The American Sign Language (ASL) dataset used in this research ensures diversity in sign representation, including variations in hand shapes, positions, and movements. The proposed models achieved high accuracy, with the CNN model reaching 93.1% and the CNN-LSTM model achieving 94.1% on test datasets, confirming their effectiveness in ASL recognition tasks.