{"title":"基于时频能量分析的有效实时音频分割方法","authors":"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang","doi":"10.1109/IMCCC.2013.222","DOIUrl":null,"url":null,"abstract":"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.","PeriodicalId":360796,"journal":{"name":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis\",\"authors\":\"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang\",\"doi\":\"10.1109/IMCCC.2013.222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.\",\"PeriodicalId\":360796,\"journal\":{\"name\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"volume\":\"281 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCCC.2013.222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCCC.2013.222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis
Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.