Dense Captioning of Natural Scenes in Spanish

Alejandro Gomez-Garay, Bogdan Raducanu, Joaquín Salas

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The inclusion of visually impaired people to daily life is a challenging and active area of research. This work studies how to bring information about the surroundings to people delivered as verbal descriptions in Spanish using wearable devices. We use a neural network (DenseCap) for both identifying objects and generating phrases about them. DenseCap is running on a server to describe an image fed from a smartphone application, and its output is the text which a smartphone verbalizes. Our implementation achieves a mean Average Precision (mAP) of 5.0 in object recognition and quality of captions and takes an average of 7.5 s from the moment one grabs a picture until one receives the verbalization in Spanish.

Original languageEnglish
Title of host publicationPattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings
EditorsJose Francisco Martinez-Trinidad, Jesus Ariel Carrasco-Ochoa, Jose Arturo Olvera-Lopez, Sudeep Sarkar
PublisherSpringer Verlag
Pages145-154
Number of pages10
ISBN (Print)9783319921976
DOIs
StatePublished - 2018
Event10th Mexican Conference on Pattern Recognition, MCPR 2018 - Puebla, Mexico
Duration: 27 Jun 201830 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10880 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th Mexican Conference on Pattern Recognition, MCPR 2018
Country/TerritoryMexico
CityPuebla
Period27/06/1830/06/18

Keywords

  • Computer vision
  • Deep learning
  • Image captioning
  • Spanish language

Fingerprint

Dive into the research topics of 'Dense Captioning of Natural Scenes in Spanish'. Together they form a unique fingerprint.

Cite this