Analysis of Depth and Semantic Mask for Perceiving a Physical Environment Using Virtual Samples Generated by a GAN

Javier Maldonado-Romo; Mario Aldape-Perez; Alejandro Rodriguez-Molina

doi:10.1109/ACCESS.2021.3137797

Analysis of Depth and Semantic Mask for Perceiving a Physical Environment Using Virtual Samples Generated by a GAN

Javier Maldonado-Romo, Mario Aldape-Perez, Alejandro Rodriguez-Molina

Centro de Innovación y Desarrollo Tecnológico en Cómputo (CIDETEC)

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Micro aerial vehicles (MAVs) can make explorations in 3D environments using technologies capable of perceiving the environment to map and estimate the location of objects that could cause collisions, such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the agent needs to move during the environment mapping, reducing the flying time to employ additional activities. It has to be noted that adding more devices (sensors) to MAVs implies more power consumption. Since more energy to perform tasks is required, growing the dimensions of MAVs limits the flying time. Contrarily, Generative Adversarial Networks (GAN) have demonstrated the usefulness of creating images from one domain to another, but the GAN domain changes require a large number of samples. Therefore, an interoperability coefficient is employed to determine a minimum number of samples to connect the different domains. In order to prove the coefficient, the performance to estimate the depth and semantic mask between authentic and virtual samples with the number limited of samples is analyzed. Consequently, an RGB-D sensor can be replaced by a few samples of a real scenario based on GANs. Although GAN allows creating images with depth and semantic mask information, there is an additional problem to be tackled: The presence of intrinsic noise, where a simple GAN architecture is not enough. In this proposal, the performance of this solution against a physical RGB-D sensor (Microsoft Kinect V1) and other state-of-the-art approaches is compared. Experimental results allow us to affirm that this proposal is a viable option to replace a physical RGB-D sensor with limited information.

Original language	English
Pages (from-to)	5595-5607
Number of pages	13
Journal	IEEE Access
Volume	10
DOIs	https://doi.org/10.1109/ACCESS.2021.3137797
State	Published - 2022

Keywords

3D mapping
Computer vision
machine learning
perception environment

Access to Document

10.1109/ACCESS.2021.3137797

Cite this

@article{c22459e269ae4f33888254175afd0741,

title = "Analysis of Depth and Semantic Mask for Perceiving a Physical Environment Using Virtual Samples Generated by a GAN",

abstract = "Micro aerial vehicles (MAVs) can make explorations in 3D environments using technologies capable of perceiving the environment to map and estimate the location of objects that could cause collisions, such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the agent needs to move during the environment mapping, reducing the flying time to employ additional activities. It has to be noted that adding more devices (sensors) to MAVs implies more power consumption. Since more energy to perform tasks is required, growing the dimensions of MAVs limits the flying time. Contrarily, Generative Adversarial Networks (GAN) have demonstrated the usefulness of creating images from one domain to another, but the GAN domain changes require a large number of samples. Therefore, an interoperability coefficient is employed to determine a minimum number of samples to connect the different domains. In order to prove the coefficient, the performance to estimate the depth and semantic mask between authentic and virtual samples with the number limited of samples is analyzed. Consequently, an RGB-D sensor can be replaced by a few samples of a real scenario based on GANs. Although GAN allows creating images with depth and semantic mask information, there is an additional problem to be tackled: The presence of intrinsic noise, where a simple GAN architecture is not enough. In this proposal, the performance of this solution against a physical RGB-D sensor (Microsoft Kinect V1) and other state-of-the-art approaches is compared. Experimental results allow us to affirm that this proposal is a viable option to replace a physical RGB-D sensor with limited information.",

keywords = "3D mapping, Computer vision, machine learning, perception environment",

author = "Javier Maldonado-Romo and Mario Aldape-Perez and Alejandro Rodriguez-Molina",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.",

year = "2022",

doi = "10.1109/ACCESS.2021.3137797",

language = "Ingl{\'e}s",

volume = "10",

pages = "5595--5607",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - Analysis of Depth and Semantic Mask for Perceiving a Physical Environment Using Virtual Samples Generated by a GAN

AU - Maldonado-Romo, Javier

AU - Aldape-Perez, Mario

AU - Rodriguez-Molina, Alejandro

PY - 2022

Y1 - 2022

N2 - Micro aerial vehicles (MAVs) can make explorations in 3D environments using technologies capable of perceiving the environment to map and estimate the location of objects that could cause collisions, such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the agent needs to move during the environment mapping, reducing the flying time to employ additional activities. It has to be noted that adding more devices (sensors) to MAVs implies more power consumption. Since more energy to perform tasks is required, growing the dimensions of MAVs limits the flying time. Contrarily, Generative Adversarial Networks (GAN) have demonstrated the usefulness of creating images from one domain to another, but the GAN domain changes require a large number of samples. Therefore, an interoperability coefficient is employed to determine a minimum number of samples to connect the different domains. In order to prove the coefficient, the performance to estimate the depth and semantic mask between authentic and virtual samples with the number limited of samples is analyzed. Consequently, an RGB-D sensor can be replaced by a few samples of a real scenario based on GANs. Although GAN allows creating images with depth and semantic mask information, there is an additional problem to be tackled: The presence of intrinsic noise, where a simple GAN architecture is not enough. In this proposal, the performance of this solution against a physical RGB-D sensor (Microsoft Kinect V1) and other state-of-the-art approaches is compared. Experimental results allow us to affirm that this proposal is a viable option to replace a physical RGB-D sensor with limited information.

AB - Micro aerial vehicles (MAVs) can make explorations in 3D environments using technologies capable of perceiving the environment to map and estimate the location of objects that could cause collisions, such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the agent needs to move during the environment mapping, reducing the flying time to employ additional activities. It has to be noted that adding more devices (sensors) to MAVs implies more power consumption. Since more energy to perform tasks is required, growing the dimensions of MAVs limits the flying time. Contrarily, Generative Adversarial Networks (GAN) have demonstrated the usefulness of creating images from one domain to another, but the GAN domain changes require a large number of samples. Therefore, an interoperability coefficient is employed to determine a minimum number of samples to connect the different domains. In order to prove the coefficient, the performance to estimate the depth and semantic mask between authentic and virtual samples with the number limited of samples is analyzed. Consequently, an RGB-D sensor can be replaced by a few samples of a real scenario based on GANs. Although GAN allows creating images with depth and semantic mask information, there is an additional problem to be tackled: The presence of intrinsic noise, where a simple GAN architecture is not enough. In this proposal, the performance of this solution against a physical RGB-D sensor (Microsoft Kinect V1) and other state-of-the-art approaches is compared. Experimental results allow us to affirm that this proposal is a viable option to replace a physical RGB-D sensor with limited information.

KW - 3D mapping

KW - Computer vision

KW - machine learning

KW - perception environment

UR - http://www.scopus.com/inward/record.url?scp=85122081220&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3137797

DO - 10.1109/ACCESS.2021.3137797

M3 - Artículo

AN - SCOPUS:85122081220

SN - 2169-3536

VL - 10

SP - 5595

EP - 5607

JO - IEEE Access

JF - IEEE Access

ER -

Analysis of Depth and Semantic Mask for Perceiving a Physical Environment Using Virtual Samples Generated by a GAN

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this