{"title":"鲁棒语音活动检测使用经验模式分解和调制频谱分析","authors":"Y. Kanai, M. Unoki","doi":"10.1109/ISCSLP.2012.6423519","DOIUrl":null,"url":null,"abstract":"Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"64 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Robust voice activity detection using empirical mode decomposition and modulation spectrum analysis\",\"authors\":\"Y. Kanai, M. Unoki\",\"doi\":\"10.1109/ISCSLP.2012.6423519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"64 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423519\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust voice activity detection using empirical mode decomposition and modulation spectrum analysis
Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.