Two-stage Semi-supervised Speaker Recognition with Gated Label Learning

Two-stage Semi-supervised Speaker Recognition with Gated Label Learning

Xingmei Wang, Jiaxiang Meng, Kong Aik Lee, Boquan Li, Jinghan Liu

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6495-6503. https://doi.org/10.24963/ijcai.2024/718

Speaker recognition technologies have been successfully applied in diverse domains, benefiting from the advance of deep learning. Nevertheless, current efforts are still subject to the lack of labeled data. Such issues have been attempted in computer vision, through semi-supervised learning (SSL) that assigns pseudo labels for unlabeled data, undertaking the role of labeled ones. Through our empirical evaluations, the state-of-the-art SSL methods show unsatisfactory performance in speaker recognition tasks, due to the imbalance between the quantity and quality of pseudo labels. Therefore, in this work, we propose a two-stage SSL framework, with the aim to address the data scarcity challenge. We first construct an initial contrastive learning network, where the encoder outputs the embedding representation of utterances. Furthermore, we construct an iterative holistic semi-supervised learning network that involves a clustering strategy to assign pseudo labels, and a gated label learning (GLL) strategy to further select reliable pseudo-label data. Systematical evaluations show that our proposed framework achieves superior performance in speaker recognition than the state-of-the-art methods, matching the performance of supervised learning.
Keywords:
Natural Language Processing: NLP: Speech
Machine Learning: ML: Semi-supervised learning