Bin Liu, J. Tao, Fuyuan Mo, Ya Li, Zhengqi Wen, Shanfeng Liu
{"title":"Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability","authors":"Bin Liu, J. Tao, Fuyuan Mo, Ya Li, Zhengqi Wen, Shanfeng Liu","doi":"10.1109/ISCSLP.2014.6936602","DOIUrl":null,"url":null,"abstract":"Voice activity detection (VAD) is widely used for various speech-based systems which is an important pre-processing step. This paper proposes a robust voice activity detection algorithm. In the proposed algorithm, the sub-band temporal envelope and the sub-band long-term signal variability are considered to distinguish the speech from all kinds of non-speech which include stationary noise and non-stationary noise. The two features are combined to make a robust VAD decision according to the fusion decision. The proposed algorithm also is an unsupervised low-complexity algorithm and can operate without pre-train models. The experiments results show that the proposed algorithm is prior to the different baseline algorithms and can handle a variety of noise environments over a wide range of signal-to-noise ratios. The proposed algorithm could apply to speech-based systems.","PeriodicalId":285277,"journal":{"name":"The 9th International Symposium on Chinese Spoken Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 9th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2014.6936602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Voice activity detection (VAD) is widely used for various speech-based systems which is an important pre-processing step. This paper proposes a robust voice activity detection algorithm. In the proposed algorithm, the sub-band temporal envelope and the sub-band long-term signal variability are considered to distinguish the speech from all kinds of non-speech which include stationary noise and non-stationary noise. The two features are combined to make a robust VAD decision according to the fusion decision. The proposed algorithm also is an unsupervised low-complexity algorithm and can operate without pre-train models. The experiments results show that the proposed algorithm is prior to the different baseline algorithms and can handle a variety of noise environments over a wide range of signal-to-noise ratios. The proposed algorithm could apply to speech-based systems.