FASSD-net model for person semantic segmentation

Luis Brandon Garcia-Ortiz; Jose Portillo-Portillo; Aldo Hernandez-Suarez; Jesus Olivares-Mercado; Gabriel Sanchez-Perez; Karina Toscano-Medina; Hector Perez-Meana; Gibran Benitez-Garcia

doi:10.3390/electronics10121393

FASSD-net model for person semantic segmentation

Luis Brandon Garcia-Ortiz, Jose Portillo-Portillo, Aldo Hernandez-Suarez, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, Karina Toscano-Medina, Hector Perez-Meana, Gibran Benitez-Garcia

Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), Unidad Culhuacán

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.

Original language	English
Article number	1393
Journal	Electronics (Switzerland)
Volume	10
Issue number	12
DOIs	https://doi.org/10.3390/electronics10121393
State	Published - 2 Jun 2021

Keywords

Cityscapes
Deep learning
Human silhouette
Person class
Semantic segmentation

Access to Document

10.3390/electronics10121393

Cite this

@article{0c2e747d64c943dca9807cb0094123b3,

title = "FASSD-net model for person semantic segmentation",

abstract = "This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.",

keywords = "Cityscapes, Deep learning, Human silhouette, Person class, Semantic segmentation",

author = "Garcia-Ortiz, {Luis Brandon} and Jose Portillo-Portillo and Aldo Hernandez-Suarez and Jesus Olivares-Mercado and Gabriel Sanchez-Perez and Karina Toscano-Medina and Hector Perez-Meana and Gibran Benitez-Garcia",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = jun,

day = "2",

doi = "10.3390/electronics10121393",

language = "Ingl{\'e}s",

volume = "10",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute",

number = "12",

}

TY - JOUR

T1 - FASSD-net model for person semantic segmentation

AU - Garcia-Ortiz, Luis Brandon

AU - Portillo-Portillo, Jose

AU - Hernandez-Suarez, Aldo

AU - Olivares-Mercado, Jesus

AU - Sanchez-Perez, Gabriel

AU - Toscano-Medina, Karina

AU - Perez-Meana, Hector

AU - Benitez-Garcia, Gibran

PY - 2021/6/2

Y1 - 2021/6/2

N2 - This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.

AB - This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.

KW - Cityscapes

KW - Deep learning

KW - Human silhouette

KW - Person class

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85107477056&partnerID=8YFLogxK

U2 - 10.3390/electronics10121393

DO - 10.3390/electronics10121393

M3 - Artículo

AN - SCOPUS:85107477056

SN - 2079-9292

VL - 10

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 12

M1 - 1393

ER -

FASSD-net model for person semantic segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this