TY - JOUR
T1 - Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks
AU - Mejía-Peréz, Kenneth
AU - Córdova-Esparza, Diana Margarita
AU - Terven, Juan
AU - Herrera-Navarro, Ana Marcela
AU - García-Ramírez, Teresa
AU - Ramírez-Pedraza, Alfonso
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.
AB - Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.
KW - RGB-D camera
KW - recurrent neural networks
KW - sign language
UR - http://www.scopus.com/inward/record.url?scp=85131506521&partnerID=8YFLogxK
U2 - 10.3390/app12115523
DO - 10.3390/app12115523
M3 - Artículo
AN - SCOPUS:85131506521
SN - 2076-3417
VL - 12
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 11
M1 - 5523
ER -