Very deep convolutional neural network for speech recognition based on words

Javier O. Pinzon, Robinson Jimenez-Moreno, Oscar Aviles, Paola Nino, Diana Ovalle

Research output: Contribution to journalArticlepeer-review

Abstract

This study presents the implementation of two very deep convolutional neural network architectures applied to speech recognition based on the usage of complete words for this case 12 specific words in order to evaluate their performance in two types of environments, one semicontrolled and another non-controlled. One of the architectures developed is based on the use of linear filters only in frequency while the other consists of linear filters in both frequency and time. It is proposed to use the power spectral density with its first and second derivatives as input of the network in order to strengthen the variety of feature maps that can be used in neural networks for speech recognition. Finally, in the tests performed in real time, the architecture with filters of frequency and time reaches an error rate of 16.67% in a semicontrolled environment while the other architecture obtained a 41.67%. This means that the architecture with the lowest error rate has better performance for word recognition, even with small databases and specialized in a particular group of people.

Original languageEnglish
Pages (from-to)6680-6685
Number of pages6
JournalJournal of Engineering and Applied Sciences
Volume13
Issue number16
DOIs
StatePublished - 2018
Externally publishedYes

Keywords

  • CNN architecture
  • Deep convolutional neural network
  • Power spectral density
  • Proposed
  • Speech recognition

Fingerprint

Dive into the research topics of 'Very deep convolutional neural network for speech recognition based on words'. Together they form a unique fingerprint.

Cite this