{"title":"Audio Source Separation using Wave-U-Net with Spectral Loss","authors":"Varun Patkar, Tanish Parmar, Parth Narvekar, Vedant Pawar, Joanne Gomes","doi":"10.1109/CSCITA55725.2023.10104853","DOIUrl":null,"url":null,"abstract":"Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.","PeriodicalId":224479,"journal":{"name":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Communication System, Computing and IT Applications (CSCITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCITA55725.2023.10104853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing Audio Source Separation models usually operate using magnitude spectrum and neglect the phase information which results in long-range temporal correlations because of its high sampling rates. Audio source separation has been a problem since long and only a handful of solutions have been presented for it. This research work presents a Wave-U-Net architecture with Spectral Loss Function which separates input audio into multiple audio file of different instrument sounds along with vocals. Existing Wave-U-Net Architecture with Mean Square Error (MSE) loss function provides poor quality results due to lack of training on only specific instruments and use of MSE as an evaluation parameter. While commenting about the loss functions, shift invariance is an important aspect that should be taken into consideration. This research work makes use of Spectral Loss Function in coordination with Wave-U-Net architecture, which automatically syncs the phase even if two audio sources are asynchronised. Spectral Loss Function solves the problem of shift invariance. The MUSDB18 Dataset is used to train the proposed model and the results are compared using evaluation metrics such as Signal to Distortion Ratio (SDR). After successful implementation of the Wave-U-Net Architecture with Spectral Loss Function it is observed that the accuracy of the system has been improved significantly.