Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions

Leonel Rosas-Arias; Gibran Benitez-Garcia; José Portillo-Portillo; Gabriel Sánchez-Pérez; Keiji Yanai

doi:10.1109/ICPR48806.2021.9413176

Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions

Leonel Rosas-Arias, Gibran Benitez-Garcia, José Portillo-Portillo, Gabriel Sánchez-Pérez, Keiji Yanai

Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), Unidad Culhuacán

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

11 Scopus citations

Abstract

Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024×2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at github.com/GibranBenitez/FASSD-Net.

Original language	English
Title of host publication	Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2264-2271
Number of pages	8
ISBN (Electronic)	9781728188089
DOIs	https://doi.org/10.1109/ICPR48806.2021.9413176
State	Published - 2020
Event	25th International Conference on Pattern Recognition, ICPR 2020 - Virtual, Milan, Italy Duration: 10 Jan 2021 → 15 Jan 2021

Publication series

Name	Proceedings - International Conference on Pattern Recognition
ISSN (Print)	1051-4651

Conference

Conference	25th International Conference on Pattern Recognition, ICPR 2020
Country/Territory	Italy
City	Virtual, Milan
Period	10/01/21 → 15/01/21

Access to Document

10.1109/ICPR48806.2021.9413176

Cite this

Rosas-Arias, L., Benitez-Garcia, G., Portillo-Portillo, J., Sánchez-Pérez, G., & Yanai, K. (2020). Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions. In Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition (pp. 2264-2271). Article 9413176 (Proceedings - International Conference on Pattern Recognition). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR48806.2021.9413176

Rosas-Arias, Leonel ; Benitez-Garcia, Gibran ; Portillo-Portillo, José et al. / Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions. Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 2264-2271 (Proceedings - International Conference on Pattern Recognition).

@inproceedings{ebfdab9053fb4a8bbff9ae3039b44735,

title = "Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions",

abstract = "Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024×2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at github.com/GibranBenitez/FASSD-Net.",

author = "Leonel Rosas-Arias and Gibran Benitez-Garcia and Jos{\'e} Portillo-Portillo and Gabriel S{\'a}nchez-P{\'e}rez and Keiji Yanai",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 25th International Conference on Pattern Recognition, ICPR 2020 ; Conference date: 10-01-2021 Through 15-01-2021",

year = "2020",

doi = "10.1109/ICPR48806.2021.9413176",

language = "Ingl{\'e}s",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2264--2271",

booktitle = "Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition",

address = "Estados Unidos",

}

Rosas-Arias, L, Benitez-Garcia, G, Portillo-Portillo, J , Sánchez-Pérez, G & Yanai, K 2020, Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions. in Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition., 9413176, Proceedings - International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers Inc., pp. 2264-2271, 25th International Conference on Pattern Recognition, ICPR 2020, Virtual, Milan, Italy, 10/01/21. https://doi.org/10.1109/ICPR48806.2021.9413176

Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions. / Rosas-Arias, Leonel; Benitez-Garcia, Gibran; Portillo-Portillo, José et al.
Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc., 2020. p. 2264-2271 9413176 (Proceedings - International Conference on Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions

AU - Rosas-Arias, Leonel

AU - Benitez-Garcia, Gibran

AU - Portillo-Portillo, José

AU - Sánchez-Pérez, Gabriel

AU - Yanai, Keiji

PY - 2020

Y1 - 2020

N2 - Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024×2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at github.com/GibranBenitez/FASSD-Net.

AB - Recent works have shown promising results applied to real-time semantic segmentation tasks. To maintain fast inference speed, most of the existing networks make use of light decoders, or they simply do not use them at all. This strategy helps to maintain a fast inference speed; however, their accuracy performance is significantly lower in comparison to non-real-time semantic segmentation networks. In this paper, we introduce two key modules aimed to design a high-performance decoder for real-time semantic segmentation for reducing the accuracy gap between real-time and non-real-time segmentation networks. Our first module, Dilated Asymmetric Pyramidal Fusion (DAPF), is designed to substantially increase the receptive field on the top of the last stage of the encoder, obtaining richer contextual features. Our second module, Multi-resolution Dilated Asymmetric (MDA) module, fuses and refines detail and contextual information from multi-scale feature maps coming from early and deeper stages of the network. Both modules exploit contextual information without excessively increasing the computational complexity by using asymmetric convolutions. Our proposed network entitled “FASSD-Net” reaches 78.8% of mIoU accuracy on the Cityscapes validation dataset at 41.1 FPS on full resolution images (1024×2048). Besides, with a light version of our network, we reach 74.1% of mIoU at 133.1 FPS (full resolution) on a single NVIDIA GTX 1080Ti card with no additional acceleration techniques. The source code and pre-trained models are available at github.com/GibranBenitez/FASSD-Net.

UR - http://www.scopus.com/inward/record.url?scp=85110549629&partnerID=8YFLogxK

U2 - 10.1109/ICPR48806.2021.9413176

DO - 10.1109/ICPR48806.2021.9413176

M3 - Contribución a la conferencia

AN - SCOPUS:85110549629

T3 - Proceedings - International Conference on Pattern Recognition

SP - 2264

EP - 2271

BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 25th International Conference on Pattern Recognition, ICPR 2020

Y2 - 10 January 2021 through 15 January 2021

ER -

Rosas-Arias L, Benitez-Garcia G, Portillo-Portillo J , Sánchez-Pérez G, Yanai K. Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions. In Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition. Institute of Electrical and Electronics Engineers Inc. 2020. p. 2264-2271. 9413176. (Proceedings - International Conference on Pattern Recognition). doi: 10.1109/ICPR48806.2021.9413176

Fast and accurate real-time semantic segmentation with dilated asymmetric convolutions

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this