TY - JOUR
T1 - Multimodal learning based spatial relation identification
AU - Dash, Sandeep Kumar
AU - Sureshchandra, Y. V.
AU - Mishra, Yatharth
AU - Pakray, Partha
AU - Das, Ranjita
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2020 Instituto Politecnico Nacional. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Spatial Relation identification is one of the integral parts of Spatial Information Retrieval. It deals with identifying the spatially related objects in view of their physical orientation or placement with respect to each other. The concept is widely used in many fields such as Robotics, Image Caption Generation and many more such areas. In this work the focus is to gather information from multiple modalities such as Image and its corresponding Text so as to strengthen the learning process for the identification of Spatial Relation pairs from a given text. Two different multimodal approaches are proposed in this work. In the first approach, information is explored as a sequential learning process where the individual Spatial Roles are identified as connected entities, which makes the Spatial Relation retrieval easy and efficient enough. To counter the small size of the dataset along with necessity to avoid overfitting, an efficient backward propagation based Neural Network was used to classify the candidate roles and the relations. The feature selection was different for all the classification tasks. Building on the selected feature from the first approach, the second approach uses a transfer learning method that utilizes an existing image caption generation model to retrieve the vital topic based information from image which is then used for the task. Thereby both approaches used information from two modalities which are further used to train the system in the respective approach. The model achieves state-of-the-art performance in terms of Precision for two of the Spatial Roles identification. This validates the advantage of using multimodal learning when compared with other partial-multimodal processes.
AB - Spatial Relation identification is one of the integral parts of Spatial Information Retrieval. It deals with identifying the spatially related objects in view of their physical orientation or placement with respect to each other. The concept is widely used in many fields such as Robotics, Image Caption Generation and many more such areas. In this work the focus is to gather information from multiple modalities such as Image and its corresponding Text so as to strengthen the learning process for the identification of Spatial Relation pairs from a given text. Two different multimodal approaches are proposed in this work. In the first approach, information is explored as a sequential learning process where the individual Spatial Roles are identified as connected entities, which makes the Spatial Relation retrieval easy and efficient enough. To counter the small size of the dataset along with necessity to avoid overfitting, an efficient backward propagation based Neural Network was used to classify the candidate roles and the relations. The feature selection was different for all the classification tasks. Building on the selected feature from the first approach, the second approach uses a transfer learning method that utilizes an existing image caption generation model to retrieve the vital topic based information from image which is then used for the task. Thereby both approaches used information from two modalities which are further used to train the system in the respective approach. The model achieves state-of-the-art performance in terms of Precision for two of the Spatial Roles identification. This validates the advantage of using multimodal learning when compared with other partial-multimodal processes.
KW - Multi layer perceptron
KW - Multimodal learning
KW - Spatial relation identification
KW - Spatial role labeling
UR - http://www.scopus.com/inward/record.url?scp=85095705556&partnerID=8YFLogxK
U2 - 10.13053/CYS-24-3-3773
DO - 10.13053/CYS-24-3-3773
M3 - Artículo
AN - SCOPUS:85095705556
SN - 1405-5546
VL - 24
SP - 1327
EP - 1335
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 3
ER -