Computer Vision and Image Understanding - Trustworthy Cross-Modal Reasoning for Video-Language Understanding

人工智能

Computer Vision and Image Understanding

Trustworthy Cross-Modal Reasoning for Video-Language Understanding

摘要截稿:

全文截稿: 2024-04-30

影响因子: 3.121

期刊难度:

CCF分类: B类

中科院JCR分区:

• 大类 : 计算机科学 - 3区

• 小类 : 计算机：人工智能 - 3区

• 小类 : 工程：电子与电气 - 3区

Overview

Video-language understanding and reasoning are long-standing problems for the CV and Multimedia communities. By endowing an AI machine with the crossmodality reasoning ability for video-language understanding, AI researchers expect the machine to “think” like a human and then make trustable decisions. However, most existing efforts primarily aim to improve in-domain performance while overlooking how to truly capture the essence of cross-modal reasoning. Especially the fundamental question in video-language understanding (Whether the model simply learns multimodal correlations hidden in datasets and whether it yields reliable in-domain results?) is usually overlooked by researchers and has yet to be well answered. Therefore, this special issue covers the continual growth of research, primarily related to the robustness, fairness, explainability, and security of video-oriented cross-modal reasoning.
The purpose of this special issue is to solicit high-quality, high-impact, and original papers on current developments in cross-modal reasoning for video-language understanding. We are interested in submissions covering topics of particular interest that include but are not limited to the following:

New datasets for trustworthy video-language understanding
Adversarial learning for robust multimodal representation
New methods for robust video summarization
Cross-modal semantics-consistent representation learning
Domain generalization in video-language understanding
Causal learning for trustworthy multimodal reasoning
Unfair bias measurement and mitigation in video-language understanding
Explainable multimodal data fusion and interaction
Brain-inspired networks for explainable cross-modal reasoning
Trustworthy reasoning algorithm in video dialog
Knowledge-driven explainable cross-modal reasoning
Text-guided visual-textual reasoning and generation
Privacy protection and security control in cross-modal AIGC
Applications of trustworthy video-language understanding
Guest editors:
Dan Guo, PhDHefei University of Technology, Hefei, China
Zhun Zhong, PhDUniversity of Nottingham, Nottingham, United Kingdom
Subhankar Roy, PhDTélécom Paris, Paris, France,
Linchao Zhu, PhDZhejiang University, Hangzhou, China,
Chuang Gan, PhDUMass Amherst, Amherst, United States of AmericaMIT-IBM Watson AI Lab, Cambridge, United States of America
Meng Wang, PhDHefei University of Technology, Hefei, China