Who said that? The crossmodal matching identity for inferring unfamiliar faces from voices

E. A. Escoto Sotelo, Tomoaki Nakamura, Takayuki Nagai, E. Escamilla Hernandez

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper proposes a method for matching unfamiliar person's face to unfamiliar voice. The idea behind this is crossmodal perception of human including many illusions such as the McGurk effect, ventriloquist illusion, and so on. Especially, we focus on recent psychological evidence suggesting human can do matching between unfamiliar faces and unfamiliar voices to some extent. The aim of this paper is to mimic this ability on a computer. In order to realize the matching of an unfamiliar person's face to an unfamiliar voice, a dataset of pairs of facial images and corresponding voices are used as knowledge. It means that the unfamiliar voice is matched to the closest known speaker model. Since the database contains corresponding facial image, the system can estimate a closest known face from the unfamiliar voice. Finally each unfamiliar face is matched to the estimated known face and the final recognition result is obtained. To this end, we first implement a speaker recognition system based on Mel Frequency Cepstral Coefficients as the speech feature and Gaussian mixtures models as the classifier. We also use a two-dimensional HMM-based face recognizer and propose a statistical integration of audio/visual recognition results. To show the possibility of the proposed system, unfamiliar speaker recognition experiments are carried out using 60 sentences from the ATR-503 sentences uttered by 20 university students.

Original languageEnglish
Title of host publication8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012
Pages97-104
Number of pages8
DOIs
StatePublished - 2012
Event8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012 - Sorrento, Italy
Duration: 25 Nov 201229 Nov 2012

Publication series

Name8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r

Conference

Conference8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012
Country/TerritoryItaly
CitySorrento
Period25/11/1229/11/12

Keywords

  • Audio-visual integration cross-modal matching identity system
  • EM algorithm
  • Gaussian mixture models
  • Pseudo 2-D Hidden Markov model

Fingerprint

Dive into the research topics of 'Who said that? The crossmodal matching identity for inferring unfamiliar faces from voices'. Together they form a unique fingerprint.

Cite this