Audio Source Separation using Wave-U-Net with Spectral Loss

2023 International Conference on Communication System, Computing and IT Applications (CSCITA) Pub Date : 2023-03-31 DOI:10.1109/CSCITA55725.2023.10104853

Varun Patkar, Tanish Parmar, Parth Narvekar, Vedant Pawar, Joanne Gomes

{"title":"Audio Source Separation using Wave-U-Net with Spectral Loss","authors":"Varun Patkar, Tanish Parmar, Parth Narvekar, Vedant Pawar, Joanne Gomes","doi":"10.1109/CSCITA55725.2023.10104853","DOIUrl":null,"url":null,"abstract":"Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.","PeriodicalId":224479,"journal":{"name":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCITA55725.2023.10104853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

带频谱损失的Wave-U-Net音频源分离

现有的音频源分离模型通常使用幅度谱，而忽略了相位信息，由于采样率高，导致了长期的时间相关性。音频源分离一直是一个问题，只有少数解决方案被提出。本研究提出了一种具有频谱损失函数的Wave-U-Net架构，该架构将输入音频与人声一起分离成多个不同乐器声音的音频文件。由于缺乏对特定仪器的训练以及使用均方误差(MSE)作为评估参数，现有的具有均方误差(MSE)损失函数的Wave-U-Net架构提供的结果质量很差。在讨论损失函数时，移位不变性是应该考虑的一个重要方面。本研究工作利用频谱损失函数与Wave-U-Net架构相协调，即使两个音频源是异步的，也能自动同步相位。谱损失函数解决了平移不变性问题。使用MUSDB18数据集对所提出的模型进行训练，并使用信号失真比(SDR)等评价指标对结果进行比较。在成功实现了带谱损失函数的Wave-U-Net体系结构后，系统的精度得到了显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 International Conference on Communication System, Computing and IT Applications (CSCITA)

自引率

0.00%

发文量

期刊最新文献

Reliability Stripe Coagulation in Two Failure Tolerant Storage Arrays Supply Chain Authentication for Vaccine Passport Using Blockchain CNN Based Image Descriptor for Polycystic Ovarian Morphology from Transvaginal Ultrasound NeuralBee - A Beehive Health Monitoring System A Framework for Development of a Virtual Campus Tour