{"title":"Deep Multi-channel Speech Source Separation with Time-frequency Masking for Spatially Filtered Microphone Input Signal","authors":"M. Togami","doi":"10.23919/Eusipco47968.2020.9287810","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a multi-channel speech source separation technique which connects an unsupervised spatial filtering without a deep neural network (DNN) to a DNN-based speech source separation in a cascade manner. In the speech source separation technique, estimation of a covariance matrix is a highly important part. Recent studies showed that it is effective to estimate the covariance matrix by multiplying cross-correlation of microphone input signal with a time-frequency mask (TFM) inferred by the DNN. However, this assumption is not valid actually and overlapping of multiple speech sources lead to degradation of estimation accuracy of the multi-channel covariance matrix. Instead, we propose a multichannel covariance matrix estimation technique which estimates the covariance matrix by a TFM for the separated speech signal by the unsupervised spatial filtering. Pre-filtered signal can reduce overlapping of multiple speech sources and increase estimation accuracy of the covariance matrix. Experimental results show that the proposed estimation technique of the multichannel covariance matrix is effective.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"169 1","pages":"266-270"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a multi-channel speech source separation technique which connects an unsupervised spatial filtering without a deep neural network (DNN) to a DNN-based speech source separation in a cascade manner. In the speech source separation technique, estimation of a covariance matrix is a highly important part. Recent studies showed that it is effective to estimate the covariance matrix by multiplying cross-correlation of microphone input signal with a time-frequency mask (TFM) inferred by the DNN. However, this assumption is not valid actually and overlapping of multiple speech sources lead to degradation of estimation accuracy of the multi-channel covariance matrix. Instead, we propose a multichannel covariance matrix estimation technique which estimates the covariance matrix by a TFM for the separated speech signal by the unsupervised spatial filtering. Pre-filtered signal can reduce overlapping of multiple speech sources and increase estimation accuracy of the covariance matrix. Experimental results show that the proposed estimation technique of the multichannel covariance matrix is effective.