Computer Speech and Language - Special issue on Two decades into Speaker Recognition Evaluation - are we there yet?

人工智能

Computer Speech and Language

Special issue on Two decades into Speaker Recognition Evaluation - are we there yet?

摘要截稿:

全文截稿: 2019-04-15

影响因子: 2.116

期刊难度:

CCF分类: C类

中科院JCR分区:

• 大类 : 计算机科学 - 2区

• 小类 : 计算机：人工智能 - 3区

Overview

Automatic speaker recognition is the task of identifying or verifying an individual’s identity from their voice samples using machine learning algorithms, without any human intervention. It has seen significant advancements over the past few decades, giving rise to the successful introduction of commercial products. The earliest paper reporting an investigation into the reliability of sound spectrograms, dubbed as “voiceprint” in analogous to fingerprint, was published in 1970 following a number of over-optimistic claims in the 60s. It was not until 1996 that the U.S. National Institute of Standards and Technology (NIST) began holding regular formal speaker recognition evaluations (SRE). The competitive evaluations provide a common platform and testbed for exploring promising new ideas in speaker recognition, as well as measuring the performance of the latest state of speaker recognition technology. Two decades of systematic and open competitive evaluations have undoubtedly helped provide credible indication of speaker recognition as a reliable and testable technology for person authentication.

With the advent of Big Data and the resurrection of data-hungry modeling techniques such as artificial neural networks, more recently the research focus has shifted from a more controlled scenario towards larger and more realistic speaker in the wild scenarios. The latest cycle of NIST evaluations (SRE’18), which in addition to traditional conversational telephony speech (CTS) involves voice over IP (VOIP) data as well as audio extracted from online videos, serves as a good checkpoint. This special issue aims to compile the latest technical advances and other similar efforts contributing towards such direction.

It is the goal of this special issue to bring together researchers in the speaker recognition and related fields, with the aim of providing the readership of the Elsevier Computer Speech and Language with up-to-date papers on recent advances in evaluations, databases, implementation, algorithms, and theoretical perspectives on the state-of-the-art in speaker recognition. Submissions of comprehensive description and analysis of large-scale implementations for benchmarking and commercial applications, with a focus on perspective of interest to the speaker recognition community, are encouraged. Please contact the Guest Editors if you have any questions about whether your proposed article would fit the scope of this special issue.

Topics of interest include (but are not limited to):

- Performance evaluation metrics

- Large-scale datasets for speaker recognition

- Large-scale implementation of speaker recognition systems

- Speaker embedding, theory and practice

- Domain adaptation in speaker recognition

- Unsupervised calibration

- Speaker recognition for multi-party conversation

- Deep learning in speaker recognition

- Transfer learning in speaker recognition

- Voice biometric standardization and open format

- Voice data privacy and protection

- Open source toolkit for speaker recognition