Special issue on Pattern Recognition Methods for Bridging the Vision-Language Gap in Multimodal Data Analysis
• 大类 : 工程技术 - 2区
• 小类 : 计算机：人工智能 - 2区
• 小类 : 工程：电子与电气 - 2区
The explosive growth of visual and textual data (both on the WorldWideWeb and held in private repositories by diverse institutions and companies) has led to urgent requirements in terms of searching, processing and understanding of multimedia content, by a machine. Solutions for providing access to and understanding such multimodal source data depend on bridging the semantic gap between vision and language. To solve this problem calls for expertise from the cognate fields of computer vision, image processing, text and document analysis, machine learning and pattern recognition. This problem also finds applications in the fast-emerging areas of multimedia data analysis and cross-modality learning.
In this special issue, we aim to assemble recent advances in pattern recognition relevant to the vision-and-language problem, encompassing big-data applications involving multimedia data and deep learning algorithms. The scope of the call for papers covers the use of pattern recognition and machine learning techniques for understanding cross-modal information, especially to those involving vision-and-language. Both original research as well as state-of-the-art literature reviews, are welcome for submission. However, submitted papers must be within the scope of the Pattern Recognition Journal, advancing the available pattern recognition methodology in this domain. Papers outside the remit of the journal will be rejected without review. The list of possible topics includes, but is not limited to:
Novel pattern recognition and machine learning methods which combine language and vision
Pattern recognition and machine learning for visual captioning, dialogue, and question answering
Sequence learning towards bridging vision, language and multimedia data
Language as an inference mechanism for structuring and reasoning about visual perception
Transfer learning across multimodal data
Pattern recognition for visual synthesis from language
Semantic scene graph generation from images with pattern recognition and machine learning methods
Cross-modality pattern recognition and machine learning for representation and learning, retrieval and generation, and zero/few-shot learning.
Pattern recognition and machine learning for multimedia data analysis and understanding