Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification

Li, Kai; Akagi, Masato; Wu, Yibo; Dang, Jianwu

doi:10.21437/Interspeech.2020-1700

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/Interspeech.2020-1700

ISCA Archive - Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification

ISCA Archive Interspeech 2020

Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification

Kai Li, Masato Akagi, Yibo Wu, Jianwu Dang

Speaker embeddings extracted from neural network (NN) achieve excellent performance on general speaker verification (SV) missions. Most current SV systems use only speaker labels. Therefore, the interaction between different types of domain information decrease the prediction accuracy of SV. To overcome this weakness and improve SV performance, four effective SV systems were proposed by using gender, nationality, and emotion information to add more constraints in the NN training stage. More specifically, multitask learning-based systems which including multitask gender (MTG), multitask nationality (MTN) and multitask gender and nationality (MTGN) were used to enhance gender and nationality information learning. Domain adversarial training-based system which including emotion domain adversarial training (EDAT) was used to suppress different emotions information learning. Experimental results indicate that encouraging gender and nationality information and suppressing emotion information learning improve the performance of SV. In the end, our proposed systems achieved 16.4 and 22.9% relative improvements in the equal error rate for MTL- and DAT-based systems, respectively.