Computer Speech and Language - Special issue on Speaker and language characterization and recognition: voice modeling, conversion, synthesis and ethical aspects

人工智能

Computer Speech and Language

Special issue on Speaker and language characterization and recognition: voice modeling, conversion, synthesis and ethical aspects

摘要截稿:

全文截稿: 2018-11-15

影响因子: 2.116

期刊难度:

CCF分类: C类

中科院JCR分区:

• 大类 : 计算机科学 - 2区

• 小类 : 计算机：人工智能 - 3区

Overview

Voice is one of the most casual modalities for natural and intuitive interactions Between humans as well as between humans and machines. Voice is also a central part of our identity. Voice-based solutions are currently deployed in a growing variety of applications, including person authentication: voice offers a low-cost biometric solution through automatic speaker verification (ASV). A related technology concerns digital cloning of personal voice characteristics for text-to-speech (TTS) and voice conversion (VC). In the last years, the impressive advancements of the VC/TTS field opened the way for numerous new consumer applications. Especially, VC is offering new solutions for privacy protection. However, VC/TTS also brings the possibility of misuse of the technology in order to spoof ASV systems (for example presentation attacks implemented using voice conversion). As a direct consequence, spoofing countermeasures raises a growing interest during the past years. Moreover, voice is also bringing other characteristics on the persons than their identity, which could be extracted with or without the consent of the speaker. This brings up the need to tackle in ASV and VC/TTS not only the technical challenges, but specific ethical considerations, as shown, for example, by the recent General Data Protection Regulation (GDPR).

Speaker Odyssey 2018 workshop took place in Les Sables d’Olonne, France, in June 2018 and grouped about 130 participants. The 55 accepted articles and the three keynotes showed the recent progresses made in terms of speaker modelling, a central topic in all the above topics. After two decades driven by Gaussian mixture modeling (associated more recently with subspace models), deep learning has clearly opened up new horizons. The Voice Conversion Challenge special session and several other sessions about spoofing, spoofing countermeasures and VC/TTS demonstrated the interest to study the interlinks of ASV, VC and TTS. Finally, one of the keynote talks and several presentations illustrated the growing interest of security, privacy and ethics questions.

Building on the success of Speaker Odyssey 2018 Workshop, we invite for this special issue novel research from the following non-exclusive list of topics:

- Speaker modelling and characterization (deep approaches and alternatives)

- Voice conversion and speaker-specific TTS

- Robustness to degraded channels, noise and low-bandwidth speech

- Vulnerability to spoofing attacks and advanced spoofing countermeasures

- Speaker de-identification, disguise, evasion, obfuscation and impersonation

- Beneficial links between ASV, VC, TTS and spoofing/anti-spoofing

- Objective and subjective measures of voice similarity and speech quality

- Speaker template protection and encrypted-domain ASV

- Limits and possibilities of VC and ASV in terms of security and privacy

- Ethics of ASV, VC and TTS and interrelation of technology with GDPR