The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Lin, Qingjian; Li, Tingle; Li, Ming

doi:10.21437/Interspeech.2020-1915

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/Interspeech.2020-1915

ISCA Archive - The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

ISCA Archive Interspeech 2020

The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Qingjian Lin, Tingle Li, Ming Li

This paper describes the systems developed by the DKU team for the Fearless Steps Challenge Phase-02 competition. For the Speech Activity Detection task, we start with the Long Short-Term Memory (LSTM) system and then apply the ResNet-LSTM improvement. Our ResNet-LSTM system reduces the DCF error by about 38% relatively in comparison with the LSTM baseline. We also discuss the system performance with additional training corpora included, and the lowest DCF of 1.406% on the Eval Set is gained with system pre-training. As for the Speaker Identification task, we employ the Deep ResNet vector system, which receives a variable-length feature sequence and directly generates speaker posteriors. The pretraining process with Voxceleb is also considered, and our best-performing system achieves the Top-5 accuracy of 92.393% on the Eval Set.