Journal of Visual Communication and Image Representation - Special Issue: Integrating Vision and Language for Semantic Knowledge Reasoning and Transfer

图形学与多媒体

Journal of Visual Communication and Image Representation

Special Issue: Integrating Vision and Language for Semantic Knowledge Reasoning and Transfer

摘要截稿:

全文截稿: 2019-08-01

影响因子: 2.479

期刊难度:

CCF分类: C类

中科院JCR分区:

• 大类 : 计算机科学 - 3区

• 小类 : 计算机：信息系统 - 3区

• 小类 : 计算机：软件工程 - 3区

Overview

Due to the explosive growth of visual and textual data (e.g., images, video, blogs) on the Internet and the urgent requirement of joint understanding the heterogeneous data, integrating vision and language to bridge the semantic gap has attracted a huge amount of interest from the computer vision and natural language processing communities. Great efforts have been made to study the intersection of vision and language, and fantastic applications include (i) generating image descriptions using natural language, (ii) visual question answering, (iii) retrieval of images based on textural queries (and vice versa), (iv) generating images/videos from textual descriptions, (v) language grounding and many other related topics.

Though booming recently, it remains challenging as reasoning of the connections between visual contents and linguistic words are difficult. Reasoning is based on semantic knowledge, i.e. people understanding a linguistic word (for example “swan”) involves reasoning the external knowledge of the word (e.g., what swan look like, the sounds they make, how they behave and what their skin feels like.) Although reasoning ability is always claimed in recent studies, most “reasoning” simply uncovers latent connections between visual elements and textual/semantic facts during the training on manually annotated datasets with a large number of image-text pairs. Furthermore, recent studies are always specific to certain datasets that lack generalization ability, i.e., the semantic knowledge obtained from specific dataset cannot be directly transferred to other datasets, as different benchmark may have different characteristics of its own. One potential solution is leveraging external knowledge resources (e.g., social-media sites, expert systems and Wikipedia) as intermediate bridge for knowledge transfer. However, it is still implicit that how to appropriately incorporate the comprehensive knowledge resources for more effective knowledge-based reasoning and transfer across datasets. Towards a broad perspective of applications, integrating vision and language for knowledge reasoning and transfer has yet been well exploited in existing research.

Topics of Interests:

This special issue targets the researchers and practitioners from both the academia and industry to explore how advanced learning models and systems can be leveraged to address the challenges in semantic knowledge reasoning and transfer for joint understanding vision and language. It provides a forum to publish recent state-of-the-art research findings, methodologies, technologies and services in vision-language technology for practical applications. We invite original and high quality submissions addressing all aspects of this field, which is closely related to multimedia search, multi-modal learning, cross-media analysis, cross-knowledge transfer and so on.

Topics of interest include, but are not limited to:

· Big data storage, indexing, and searching

· Deep learning methods for vision and language

· Transfer learning for vision and language

· Cross-media analysis (retrieval, hashing, transfer, reasoning, etc)

· Multi-modal learning and semantic representation learning

· Learning knowledge graph over multi-modal data

· Generating image/video descriptions using natural language

· Visual question answering/generation

· Retrieval of images based on textural queries (and vice versa)

· Generating images/videos from textual descriptions

· Language grounding