Dense Captioning of Natural Scenes in Spanish

Alejandro Gomez-Garay; Bogdan Raducanu; Joaquín Salas

doi:10.1007/978-3-319-92198-3_15

Dense Captioning of Natural Scenes in Spanish

Alejandro Gomez-Garay, Bogdan Raducanu, Joaquín Salas

Centro de Investigación en Ciencia Aplicada y Tecnología Avanzada (CICATA), Unidad Querétaro

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

The inclusion of visually impaired people to daily life is a challenging and active area of research. This work studies how to bring information about the surroundings to people delivered as verbal descriptions in Spanish using wearable devices. We use a neural network (DenseCap) for both identifying objects and generating phrases about them. DenseCap is running on a server to describe an image fed from a smartphone application, and its output is the text which a smartphone verbalizes. Our implementation achieves a mean Average Precision (mAP) of 5.0 in object recognition and quality of captions and takes an average of 7.5 s from the moment one grabs a picture until one receives the verbalization in Spanish.

Original language	English
Title of host publication	Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings
Editors	Jose Francisco Martinez-Trinidad, Jesus Ariel Carrasco-Ochoa, Jose Arturo Olvera-Lopez, Sudeep Sarkar
Publisher	Springer Verlag
Pages	145-154
Number of pages	10
ISBN (Print)	9783319921976
DOIs	https://doi.org/10.1007/978-3-319-92198-3_15
State	Published - 2018
Event	10th Mexican Conference on Pattern Recognition, MCPR 2018 - Puebla, Mexico Duration: 27 Jun 2018 → 30 Jun 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	10880 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	10th Mexican Conference on Pattern Recognition, MCPR 2018
Country/Territory	Mexico
City	Puebla
Period	27/06/18 → 30/06/18

Keywords

Computer vision
Deep learning
Image captioning
Spanish language

Access to Document

10.1007/978-3-319-92198-3_15

Cite this

Gomez-Garay, A., Raducanu, B., & Salas, J. (2018). Dense Captioning of Natural Scenes in Spanish. In J. F. Martinez-Trinidad, J. A. Carrasco-Ochoa, J. A. Olvera-Lopez, & S. Sarkar (Eds.), Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings (pp. 145-154). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10880 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-92198-3_15

Gomez-Garay, Alejandro ; Raducanu, Bogdan ; Salas, Joaquín. / Dense Captioning of Natural Scenes in Spanish. Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings. editor / Jose Francisco Martinez-Trinidad ; Jesus Ariel Carrasco-Ochoa ; Jose Arturo Olvera-Lopez ; Sudeep Sarkar. Springer Verlag, 2018. pp. 145-154 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{99a1dc5747a04ca4bcca9d7c28b5776f,

title = "Dense Captioning of Natural Scenes in Spanish",

abstract = "The inclusion of visually impaired people to daily life is a challenging and active area of research. This work studies how to bring information about the surroundings to people delivered as verbal descriptions in Spanish using wearable devices. We use a neural network (DenseCap) for both identifying objects and generating phrases about them. DenseCap is running on a server to describe an image fed from a smartphone application, and its output is the text which a smartphone verbalizes. Our implementation achieves a mean Average Precision (mAP) of 5.0 in object recognition and quality of captions and takes an average of 7.5 s from the moment one grabs a picture until one receives the verbalization in Spanish.",

keywords = "Computer vision, Deep learning, Image captioning, Spanish language",

author = "Alejandro Gomez-Garay and Bogdan Raducanu and Joaqu{\'i}n Salas",

note = "Publisher Copyright: {\textcopyright} 2018, Springer International Publishing AG, part of Springer Nature.; 10th Mexican Conference on Pattern Recognition, MCPR 2018 ; Conference date: 27-06-2018 Through 30-06-2018",

year = "2018",

doi = "10.1007/978-3-319-92198-3_15",

language = "Ingl{\'e}s",

isbn = "9783319921976",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "145--154",

editor = "Martinez-Trinidad, {Jose Francisco} and Carrasco-Ochoa, {Jesus Ariel} and Olvera-Lopez, {Jose Arturo} and Sudeep Sarkar",

booktitle = "Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings",

address = "Alemania",

}

Gomez-Garay, A, Raducanu, B & Salas, J 2018, Dense Captioning of Natural Scenes in Spanish. in JF Martinez-Trinidad, JA Carrasco-Ochoa, JA Olvera-Lopez & S Sarkar (eds), Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10880 LNCS, Springer Verlag, pp. 145-154, 10th Mexican Conference on Pattern Recognition, MCPR 2018, Puebla, Mexico, 27/06/18. https://doi.org/10.1007/978-3-319-92198-3_15

Dense Captioning of Natural Scenes in Spanish. / Gomez-Garay, Alejandro; Raducanu, Bogdan; Salas, Joaquín.
Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings. ed. / Jose Francisco Martinez-Trinidad; Jesus Ariel Carrasco-Ochoa; Jose Arturo Olvera-Lopez; Sudeep Sarkar. Springer Verlag, 2018. p. 145-154 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10880 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Dense Captioning of Natural Scenes in Spanish

AU - Gomez-Garay, Alejandro

AU - Raducanu, Bogdan

AU - Salas, Joaquín

PY - 2018

Y1 - 2018

N2 - The inclusion of visually impaired people to daily life is a challenging and active area of research. This work studies how to bring information about the surroundings to people delivered as verbal descriptions in Spanish using wearable devices. We use a neural network (DenseCap) for both identifying objects and generating phrases about them. DenseCap is running on a server to describe an image fed from a smartphone application, and its output is the text which a smartphone verbalizes. Our implementation achieves a mean Average Precision (mAP) of 5.0 in object recognition and quality of captions and takes an average of 7.5 s from the moment one grabs a picture until one receives the verbalization in Spanish.

AB - The inclusion of visually impaired people to daily life is a challenging and active area of research. This work studies how to bring information about the surroundings to people delivered as verbal descriptions in Spanish using wearable devices. We use a neural network (DenseCap) for both identifying objects and generating phrases about them. DenseCap is running on a server to describe an image fed from a smartphone application, and its output is the text which a smartphone verbalizes. Our implementation achieves a mean Average Precision (mAP) of 5.0 in object recognition and quality of captions and takes an average of 7.5 s from the moment one grabs a picture until one receives the verbalization in Spanish.

KW - Computer vision

KW - Deep learning

KW - Image captioning

KW - Spanish language

UR - http://www.scopus.com/inward/record.url?scp=85049321962&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-92198-3_15

DO - 10.1007/978-3-319-92198-3_15

M3 - Contribución a la conferencia

AN - SCOPUS:85049321962

SN - 9783319921976

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 145

EP - 154

BT - Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings

A2 - Martinez-Trinidad, Jose Francisco

A2 - Carrasco-Ochoa, Jesus Ariel

A2 - Olvera-Lopez, Jose Arturo

A2 - Sarkar, Sudeep

PB - Springer Verlag

T2 - 10th Mexican Conference on Pattern Recognition, MCPR 2018

Y2 - 27 June 2018 through 30 June 2018

ER -

Gomez-Garay A, Raducanu B, Salas J. Dense Captioning of Natural Scenes in Spanish. In Martinez-Trinidad JF, Carrasco-Ochoa JA, Olvera-Lopez JA, Sarkar S, editors, Pattern Recognition - 10th Mexican Conference, MCPR 2018, Proceedings. Springer Verlag. 2018. p. 145-154. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-92198-3_15

Dense Captioning of Natural Scenes in Spanish

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this