Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks

Kenneth Mejía-Peréz; Diana Margarita Córdova-Esparza; Juan Terven; Ana Marcela Herrera-Navarro; Teresa García-Ramírez; Alfonso Ramírez-Pedraza

doi:10.3390/app12115523

Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks

Kenneth Mejía-Peréz, Diana Margarita Córdova-Esparza, Juan Terven, Ana Marcela Herrera-Navarro, Teresa García-Ramírez, Alfonso Ramírez-Pedraza

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

15 Citas (Scopus)

Resumen

Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.

Idioma original	Inglés
Número de artículo	5523
Publicación	Applied Sciences (Switzerland)
Volumen	12
N.º	11
DOI	https://doi.org/10.3390/app12115523
Estado	Publicada - 1 jun. 2022
Publicado de forma externa	Sí

Acceder al documento

10.3390/app12115523

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{bd071ed4b3f6454687de6d790897a0d3,

title = "Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks",

abstract = "Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.",

keywords = "RGB-D camera, recurrent neural networks, sign language",

author = "Kenneth Mej{\'i}a-Per{\'e}z and C{\'o}rdova-Esparza, {Diana Margarita} and Juan Terven and Herrera-Navarro, {Ana Marcela} and Teresa Garc{\'i}a-Ram{\'i}rez and Alfonso Ram{\'i}rez-Pedraza",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = jun,

day = "1",

doi = "10.3390/app12115523",

language = "Ingl{\'e}s",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

number = "11",

}

TY - JOUR

T1 - Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks

AU - Mejía-Peréz, Kenneth

AU - Córdova-Esparza, Diana Margarita

AU - Terven, Juan

AU - Herrera-Navarro, Ana Marcela

AU - García-Ramírez, Teresa

AU - Ramírez-Pedraza, Alfonso

PY - 2022/6/1

Y1 - 2022/6/1

N2 - Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.

AB - Automatic sign language recognition is a challenging task in machine learning and computer vision. Most works have focused on recognizing sign language using hand gestures only. However, body motion and facial gestures play an essential role in sign language interaction. Taking this into account, we introduce an automatic sign language recognition system based on multiple gestures, including hands, body, and face. We used a depth camera (OAK-D) to obtain the 3D coordinates of the motions and recurrent neural networks for classification. We compare multiple model architectures based on recurrent networks such as Long Short-Term Memories (LSTM) and Gated Recurrent Units (GRU) and develop a noise-robust approach. For this work, we collected a dataset of 3000 samples from 30 different signs of the Mexican Sign Language (MSL) containing features coordinates from the face, body, and hands in 3D spatial coordinates. After extensive evaluation and ablation studies, our best model obtained an accuracy of 97% on clean test data and 90% on highly noisy data.

KW - RGB-D camera

KW - recurrent neural networks

KW - sign language

UR - http://www.scopus.com/inward/record.url?scp=85131506521&partnerID=8YFLogxK

U2 - 10.3390/app12115523

DO - 10.3390/app12115523

M3 - Artículo

AN - SCOPUS:85131506521

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 11

M1 - 5523

ER -

Automatic Recognition of Mexican Sign Language Using a Depth Camera and Recurrent Neural Networks

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto