Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie
{"title":"欠确定混响环境中基于期望最大化的室内脉冲响应重塑","authors":"Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie","doi":"10.1016/j.csl.2024.101664","DOIUrl":null,"url":null,"abstract":"<div><p>Source separation in an underdetermined reverberation environment is a very challenging issue. The classical method is based on the expectation–maximization algorithm. However, it is limited to high reverberation environments, resulting in bad or even invalid separation performance. To eliminate this restriction, a room impulse response reshaping-based expectation–maximization method is designed to solve the problem of source separation in an underdetermined reverberant environment. Firstly, a room impulse response reshaping technology is designed to eliminate the influence of audible echo on the reverberant environment, improving the quality of the received signals. Then, a new mathematical model of time-frequency mixing signals is established to reduce the approximation error of model transformation caused by high reverberation. Furthermore, an improved expectation–maximization method is proposed for real-time update learning rules of model parameters, and then the sources are separated using the estimators provided by the improved expectation–maximization method. Experimental results based on source separation of speech and music mixtures demonstrate that the proposed algorithm achieves better separation performance while maintaining much better robustness than popular expectation–maximization methods.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101664"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment\",\"authors\":\"Yuan Xie , Tao Zou , Junjie Yang , Weijun Sun , Shengli Xie\",\"doi\":\"10.1016/j.csl.2024.101664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Source separation in an underdetermined reverberation environment is a very challenging issue. The classical method is based on the expectation–maximization algorithm. However, it is limited to high reverberation environments, resulting in bad or even invalid separation performance. To eliminate this restriction, a room impulse response reshaping-based expectation–maximization method is designed to solve the problem of source separation in an underdetermined reverberant environment. Firstly, a room impulse response reshaping technology is designed to eliminate the influence of audible echo on the reverberant environment, improving the quality of the received signals. Then, a new mathematical model of time-frequency mixing signals is established to reduce the approximation error of model transformation caused by high reverberation. Furthermore, an improved expectation–maximization method is proposed for real-time update learning rules of model parameters, and then the sources are separated using the estimators provided by the improved expectation–maximization method. Experimental results based on source separation of speech and music mixtures demonstrate that the proposed algorithm achieves better separation performance while maintaining much better robustness than popular expectation–maximization methods.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"88 \",\"pages\":\"Article 101664\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000470\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000470","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Room impulse response reshaping-based expectation–maximization in an underdetermined reverberant environment
Source separation in an underdetermined reverberation environment is a very challenging issue. The classical method is based on the expectation–maximization algorithm. However, it is limited to high reverberation environments, resulting in bad or even invalid separation performance. To eliminate this restriction, a room impulse response reshaping-based expectation–maximization method is designed to solve the problem of source separation in an underdetermined reverberant environment. Firstly, a room impulse response reshaping technology is designed to eliminate the influence of audible echo on the reverberant environment, improving the quality of the received signals. Then, a new mathematical model of time-frequency mixing signals is established to reduce the approximation error of model transformation caused by high reverberation. Furthermore, an improved expectation–maximization method is proposed for real-time update learning rules of model parameters, and then the sources are separated using the estimators provided by the improved expectation–maximization method. Experimental results based on source separation of speech and music mixtures demonstrate that the proposed algorithm achieves better separation performance while maintaining much better robustness than popular expectation–maximization methods.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.