TY - JOUR
T1 - Visual vs internal attention mechanisms in deep neural networks for image classification and object detection
AU - Obeso, Abraham Montoya
AU - Benois-Pineau, Jenny
AU - García Vázquez, Mireya Saraí
AU - Acosta, Alejandro Álvaro Ramírez
N1 - Publisher Copyright:
© 2021
PY - 2022/3
Y1 - 2022/3
N2 - The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.
AB - The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.
KW - Deep learning
KW - Image classification
KW - Object detection
KW - Saliency maps
KW - Visual attention
UR - http://www.scopus.com/inward/record.url?scp=85118873465&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2021.108411
DO - 10.1016/j.patcog.2021.108411
M3 - Artículo
AN - SCOPUS:85118873465
SN - 0031-3203
VL - 123
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108411
ER -