Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Zhao, Junqi; Liu, Xubo; Zhao, Jinzheng; Yuan, Yi; Kong, Qiuqiang; Plumbley, Mark D.; Wang, Wenwu

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2407.11745 (eess)

[Submitted on 16 Jul 2024 (v1), last revised 6 Nov 2024 (this version, v2)]

Title:Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Authors:Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

View PDF HTML (experimental)

Abstract:Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model.

Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
Cite as:	arXiv:2407.11745 [eess.AS]
	(or arXiv:2407.11745v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2407.11745

Submission history

From: Junqi Zhao [view email]
[v1] Tue, 16 Jul 2024 14:11:44 UTC (1,586 KB)
[v2] Wed, 6 Nov 2024 17:52:55 UTC (1,586 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators