基于时频能量分析的有效实时音频分割方法

Chang Gao, Haifeng Li, Lin Ma, Wei Zhang
{"title":"基于时频能量分析的有效实时音频分割方法","authors":"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang","doi":"10.1109/IMCCC.2013.222","DOIUrl":null,"url":null,"abstract":"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.","PeriodicalId":360796,"journal":{"name":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis\",\"authors\":\"Chang Gao, Haifeng Li, Lin Ma, Wei Zhang\",\"doi\":\"10.1109/IMCCC.2013.222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.\",\"PeriodicalId\":360796,\"journal\":{\"name\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"volume\":\"281 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCCC.2013.222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCCC.2013.222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

音频分割是许多音频处理应用中至关重要的预处理步骤。提出了一种有效的基于时频能量分析的多级实时音频分割方法。在Mel频域上建立了不同频段的能量分布模型。在粗略分割阶段,根据时域能量估计起始点或结束点。音频和静音的频域能量在能量分布模型上具有不同的特征。然后,在精确分割阶段,基于频域能量检测端点。描述了阈值的初始化和动态调整策略。实验结果表明,与GLR-BIC相比,该方法的虚警率和漏检率分别降低了3.6%和7.0%,与双阈值方法相比,该方法的虚警率和漏检率分别降低了7.7%和11.5%。统计结果表明,15 ~6秒和6 ~10秒的句子语音识别准确率较高。该方法在这些持续时间内的句子切分率为98%,高于其他两种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Effective Real-Time Audio Segmentation Method Based on Time-Frequency Energy Analysis
Audio segmentation is a vital preprocessing step in several audio processing applications. An effective multi-stage real-time audio segmentation method based on time-frequency energy analysis is proposed in this paper. An energy distribution model for different frequency bands is built on Mel frequency domain. In the roughly segmentation stage, the starting or finishing points are estimated based on time domain energy. The frequency domain energy of audio and silence have different characteristics on the energy distribution model. Then, in the exactly segmentation stage, the endpoints are detected based on frequency domain energy. And the strategy of the initialization and dynamic adjustment of the thresholds are described. Experimental results show that this method achieves 3.6% and 7.0% reduction in false alarm rate and missed detection rate compared to GLR-BIC, and 7.7% and 11.5% reduction in false alarm rate and missed detection rate compared to double threshold method. We statistic the audio recognition accuracy of the sentences during 1s~6s and 6s~10s is higher. And the percentage of the sentences segmented by this method is 98% in these durations more than other two methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Cognitive Making Decision Engine Authentication Protocol of RFID System Based on Security Policy Optimal Dispatch Considering the Ability of Active Power Control of Wind Farms Hardware Architecture Design of Image Preprocessing and Phase Calculating Algorithms Based on FPGA An Algorithm for Detecting Lines Based on Primitive Connection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1