基于时频能量分析的有效实时音频分割方法

2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control Pub Date : 2013-09-21 DOI:10.1109/IMCCC.2013.222

Chang Gao, Haifeng Li, Lin Ma, Wei Zhang

{"title":"基于时频能量分析的有效实时音频分割方法","authors":"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang","doi":"10.1109/IMCCC.2013.222","DOIUrl":null,"url":null,"abstract":"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.","PeriodicalId":360796,"journal":{"name":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis\",\"authors\":\"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang\",\"doi\":\"10.1109/IMCCC.2013.222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.\",\"PeriodicalId\":360796,\"journal\":{\"name\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"volume\":\"281 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCCC.2013.222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCCC.2013.222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

音频分割是许多音频处理应用中至关重要的预处理步骤。提出了一种有效的基于时频能量分析的多级实时音频分割方法。在Mel频域上建立了不同频段的能量分布模型。在粗略分割阶段，根据时域能量估计起始点或结束点。音频和静音的频域能量在能量分布模型上具有不同的特征。然后，在精确分割阶段，基于频域能量检测端点。描述了阈值的初始化和动态调整策略。实验结果表明，与GLR-BIC相比，该方法的虚警率和漏检率分别降低了3.6%和7.0%，与双阈值方法相比，该方法的虚警率和漏检率分别降低了7.7%和11.5%。统计结果表明，15 ~6秒和6 ~10秒的句子语音识别准确率较高。该方法在这些持续时间内的句子切分率为98%，高于其他两种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis

Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control

自引率

0.00%

发文量

期刊最新文献

Research on Cognitive Making Decision Engine Authentication Protocol of RFID System Based on Security Policy Optimal Dispatch Considering the Ability of Active Power Control of Wind Farms Hardware Architecture Design of Image Preprocessing and Phase Calculating Algorithms Based on FPGA An Algorithm for Detecting Lines Based on Primitive Connection