Who said that? The crossmodal matching identity for inferring unfamiliar faces from voices

E. A. Escoto Sotelo, Tomoaki Nakamura, Takayuki Nagai, E. Escamilla Hernandez

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)


This paper proposes a method for matching unfamiliar person's face to unfamiliar voice. The idea behind this is crossmodal perception of human including many illusions such as the McGurk effect, ventriloquist illusion, and so on. Especially, we focus on recent psychological evidence suggesting human can do matching between unfamiliar faces and unfamiliar voices to some extent. The aim of this paper is to mimic this ability on a computer. In order to realize the matching of an unfamiliar person's face to an unfamiliar voice, a dataset of pairs of facial images and corresponding voices are used as knowledge. It means that the unfamiliar voice is matched to the closest known speaker model. Since the database contains corresponding facial image, the system can estimate a closest known face from the unfamiliar voice. Finally each unfamiliar face is matched to the estimated known face and the final recognition result is obtained. To this end, we first implement a speaker recognition system based on Mel Frequency Cepstral Coefficients as the speech feature and Gaussian mixtures models as the classifier. We also use a two-dimensional HMM-based face recognizer and propose a statistical integration of audio/visual recognition results. To show the possibility of the proposed system, unfamiliar speaker recognition experiments are carried out using 60 sentences from the ATR-503 sentences uttered by 20 university students.

Idioma originalInglés
Título de la publicación alojada8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012
Número de páginas8
EstadoPublicada - 2012
Evento8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012 - Sorrento, Italia
Duración: 25 nov. 201229 nov. 2012

Serie de la publicación

Nombre8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r


Conferencia8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012


Profundice en los temas de investigación de 'Who said that? The crossmodal matching identity for inferring unfamiliar faces from voices'. En conjunto forman una huella única.

Citar esto