Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

A. Montoya Obeso; J. Benois-Pineau; M. S.García Vázquez; A. A.Ramírez Acosta

doi:10.1007/s11042-018-6515-2

Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

A. Montoya Obeso, J. Benois-Pineau, M. S.García Vázquez, A. A.Ramírez Acosta

Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

10 Citas (Scopus)

Resumen

The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

Idioma original	Inglés
Páginas (desde-hasta)	9553-9576
Número de páginas	24
Publicación	Multimedia Tools and Applications
Volumen	78
N.º	8
DOI	https://doi.org/10.1007/s11042-018-6515-2
Estado	Publicada - 1 abr. 2019

Acceder al documento

10.1007/s11042-018-6515-2

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{1210f5ab9e7445f3b2154ab0e690853d,

title = "Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification",

abstract = "The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.",

keywords = "Cultural heritage, Data selection, Deep learning, Visual attention prediction",

author = "Obeso, {A. Montoya} and J. Benois-Pineau and V{\'a}zquez, {M. S.Garc{\'i}a} and Acosta, {A. A.Ram{\'i}rez}",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2019",

month = apr,

day = "1",

doi = "10.1007/s11042-018-6515-2",

language = "Ingl{\'e}s",

volume = "78",

pages = "9553--9576",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

number = "8",

}

Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification. / Obeso, A. Montoya; Benois-Pineau, J.; Vázquez, M. S.García et al.
En: Multimedia Tools and Applications, Vol. 78, N.º 8, 01.04.2019, p. 9553-9576.

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

TY - JOUR

T1 - Saliency-based selection of visual content for deep convolutional neural networks

T2 - Application to architectural style classification

AU - Obeso, A. Montoya

AU - Benois-Pineau, J.

AU - Vázquez, M. S.García

AU - Acosta, A. A.Ramírez

PY - 2019/4/1

Y1 - 2019/4/1

N2 - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

AB - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

KW - Cultural heritage

KW - Data selection

KW - Deep learning

KW - Visual attention prediction

UR - http://www.scopus.com/inward/record.url?scp=85053042423&partnerID=8YFLogxK

U2 - 10.1007/s11042-018-6515-2

DO - 10.1007/s11042-018-6515-2

M3 - Artículo

SN - 1380-7501

VL - 78

SP - 9553

EP - 9576

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 8

ER -

Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto