Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

A. Montoya Obeso; J. Benois-Pineau; M. S.García Vázquez; A. A.Ramírez Acosta

doi:10.1007/s11042-018-6515-2

Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

A. Montoya Obeso, J. Benois-Pineau, M. S.García Vázquez, A. A.Ramírez Acosta

Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI)

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

Original language	English
Pages (from-to)	9553-9576
Number of pages	24
Journal	Multimedia Tools and Applications
Volume	78
Issue number	8
DOIs	https://doi.org/10.1007/s11042-018-6515-2
State	Published - 1 Apr 2019

Keywords

Cultural heritage
Data selection
Deep learning
Visual attention prediction

Access to Document

10.1007/s11042-018-6515-2

Cite this

@article{1210f5ab9e7445f3b2154ab0e690853d,

title = "Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification",

abstract = "The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.",

keywords = "Cultural heritage, Data selection, Deep learning, Visual attention prediction",

author = "Obeso, {A. Montoya} and J. Benois-Pineau and V{\'a}zquez, {M. S.Garc{\'i}a} and Acosta, {A. A.Ram{\'i}rez}",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2019",

month = apr,

day = "1",

doi = "10.1007/s11042-018-6515-2",

language = "Ingl{\'e}s",

volume = "78",

pages = "9553--9576",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

number = "8",

}

TY - JOUR

T1 - Saliency-based selection of visual content for deep convolutional neural networks

T2 - Application to architectural style classification

AU - Obeso, A. Montoya

AU - Benois-Pineau, J.

AU - Vázquez, M. S.García

AU - Acosta, A. A.Ramírez

PY - 2019/4/1

Y1 - 2019/4/1

N2 - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

AB - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

KW - Cultural heritage

KW - Data selection

KW - Deep learning

KW - Visual attention prediction

UR - http://www.scopus.com/inward/record.url?scp=85053042423&partnerID=8YFLogxK

U2 - 10.1007/s11042-018-6515-2

DO - 10.1007/s11042-018-6515-2

M3 - Artículo

SN - 1380-7501

VL - 78

SP - 9553

EP - 9576

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 8

ER -

Saliency-based selection of visual content for deep convolutional neural networks: Application to architectural style classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this