Visual vs internal attention mechanisms in deep neural networks for image classification and object detection

Abraham Montoya Obeso; Jenny Benois-Pineau; Mireya Saraí García Vázquez; Alejandro Álvaro Ramírez Acosta

doi:10.1016/j.patcog.2021.108411

Visual vs internal attention mechanisms in deep neural networks for image classification and object detection

Abraham Montoya Obeso, Jenny Benois-Pineau, Mireya Saraí García Vázquez, Alejandro Álvaro Ramírez Acosta

Centro de Investigación y Desarrollo de Tecnología Digital (CITEDI)

Research output: Contribution to journal › Article › peer-review

57 Scopus citations

Abstract

The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.

Original language	English
Article number	108411
Journal	Pattern Recognition
Volume	123
DOIs	https://doi.org/10.1016/j.patcog.2021.108411
State	Published - Mar 2022

Keywords

Deep learning
Image classification
Object detection
Saliency maps
Visual attention

Access to Document

10.1016/j.patcog.2021.108411

Cite this

@article{f9af45c4c4074a088972af183e242807,

title = "Visual vs internal attention mechanisms in deep neural networks for image classification and object detection",

abstract = "The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.",

keywords = "Deep learning, Image classification, Object detection, Saliency maps, Visual attention",

author = "Obeso, {Abraham Montoya} and Jenny Benois-Pineau and {Garc{\'i}a V{\'a}zquez}, {Mireya Sara{\'i}} and Acosta, {Alejandro {\'A}lvaro Ram{\'i}rez}",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = mar,

doi = "10.1016/j.patcog.2021.108411",

language = "Ingl{\'e}s",

volume = "123",

journal = "Pattern Recognition",

issn = "0031-3203",

}

TY - JOUR

T1 - Visual vs internal attention mechanisms in deep neural networks for image classification and object detection

AU - Obeso, Abraham Montoya

AU - Benois-Pineau, Jenny

AU - García Vázquez, Mireya Saraí

AU - Acosta, Alejandro Álvaro Ramírez

PY - 2022/3

Y1 - 2022/3

N2 - The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.

AB - The so-called “attention mechanisms” in Deep Neural Networks (DNNs) denote an automatic adaptation of DNNs to capture representative features given a specific classification task and related data. Such attention mechanisms perform both globally by reinforcing feature channels and locally by stressing features in each feature map. Channel and feature importance are learnt in the global end-to-end DNNs training process. In this paper, we present a study and propose a method with a different approach, adding supplementary visual data next to training images. We use human visual attention maps obtained independently with psycho-visual experiments, both in task-driven or in free viewing conditions, or powerful models for prediction of visual attention maps. We add visual attention maps as new data alongside images, thus introducing human visual attention into the DNNs training and compare it with both global and local automatic attention mechanisms. Experimental results show that known attention mechanisms in DNNs work pretty much as human visual attention, but still the proposed approach allows a faster convergence and better performance in image classification tasks.

KW - Deep learning

KW - Image classification

KW - Object detection

KW - Saliency maps

KW - Visual attention

UR - http://www.scopus.com/inward/record.url?scp=85118873465&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2021.108411

DO - 10.1016/j.patcog.2021.108411

M3 - Artículo

AN - SCOPUS:85118873465

SN - 0031-3203

VL - 123

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 108411

ER -

Visual vs internal attention mechanisms in deep neural networks for image classification and object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this