TY - GEN
T1 - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
AU - Han, Wei
AU - Chen, Hui
AU - Gelbukh, Alexander
AU - Zadeh, Amir
AU - Morency, Louis Philippe
AU - Poria, Soujanya
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/18
Y1 - 2021/10/18
N2 - Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, previous work is restricted by the lack of leveraging dynamics of independence and correlation between modalities to reach top performance. To mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel end-to-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations. The two parts are trained simultaneously such that the combat between them is simulated. The model takes two bimodal pairs as input due to the known information imbalance among modalities. In addition, we leverage a gated control mechanism in the Transformer architecture to further improve the final output. Experimental results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our model significantly outperforms the SOTA. The implementation of this work is available at https://github.com/declare-lab/multimodal-deep-learning and https://github.com/declare-lab/BBFN.
AB - Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, previous work is restricted by the lack of leveraging dynamics of independence and correlation between modalities to reach top performance. To mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel end-to-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations. The two parts are trained simultaneously such that the combat between them is simulated. The model takes two bimodal pairs as input due to the known information imbalance among modalities. In addition, we leverage a gated control mechanism in the Transformer architecture to further improve the final output. Experimental results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our model significantly outperforms the SOTA. The implementation of this work is available at https://github.com/declare-lab/multimodal-deep-learning and https://github.com/declare-lab/BBFN.
KW - cross-modal processing
KW - multimodal fusion
KW - multimodal representations
UR - http://www.scopus.com/inward/record.url?scp=85119017553&partnerID=8YFLogxK
U2 - 10.1145/3462244.3479919
DO - 10.1145/3462244.3479919
M3 - Contribución a la conferencia
AN - SCOPUS:85119017553
T3 - ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction
SP - 6
EP - 15
BT - ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction
PB - Association for Computing Machinery, Inc
T2 - 23rd ACM International Conference on Multimodal Interaction, ICMI 2021
Y2 - 18 October 2021 through 22 October 2021
ER -