TY - JOUR
T1 - Saliency-based selection of visual content for deep convolutional neural networks
T2 - Application to architectural style classification
AU - Obeso, A. Montoya
AU - Benois-Pineau, J.
AU - Vázquez, M. S.García
AU - Acosta, A. A.Ramírez
N1 - Publisher Copyright:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.
AB - The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.
KW - Cultural heritage
KW - Data selection
KW - Deep learning
KW - Visual attention prediction
UR - http://www.scopus.com/inward/record.url?scp=85053042423&partnerID=8YFLogxK
U2 - 10.1007/s11042-018-6515-2
DO - 10.1007/s11042-018-6515-2
M3 - Artículo
SN - 1380-7501
VL - 78
SP - 9553
EP - 9576
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 8
ER -