Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Li, Tingle; Lin, Qingjian; Bao, Yuanyuan; Li, Ming

doi:10.21437/Interspeech.2020-1436

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/Interspeech.2020-1436

ISCA Archive - Atss-Net: Target Speaker Separation via Attention-Based Neural Network

ISCA Archive Interspeech 2020

Atss-Net: Target Speaker Separation via Attention-Based Neural Network

Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.