A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis

A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov
{"title":"A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis","authors":"A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov","doi":"10.1109/DCNA56428.2022.9923223","DOIUrl":null,"url":null,"abstract":"The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.","PeriodicalId":110836,"journal":{"name":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCNA56428.2022.9923223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于时频分析的语音信号分割新方法
语音信号分割的准确性直接取决于用于确定连续语音流中信息片段的开始和结束边界的参数。通过对语音信号进行频率-时间分析,提高语音/暂停分割的效率。提出了一种基于平均频率(频域)和Teager算子短时能量(时域)分析的语音/暂停分割新方法。该方法的独特之处在于,它基于连续语音流形成过程中呼吸器官的生理功能开发了一种辅助算法来纠正语音/暂停分割错误。简要概述了用于语音/暂停分割的语音信号信息参数,并详细介绍了所提出的方法的性能。所提出的方法已与已知的纯和噪声语音信号的语音/暂停分割方法进行了比较。研究结果表明,基于该方法的语音/暂停分割方法在纯语音和含噪语音信号中均取得了较好的分割效果;Teager算子函数的短期能量与平均频率的比值作为信息参数,确保与分割问题的最大相关性;一种校正假态的辅助算法,提高分割效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Discretization Effects in Speed-Gradient Two-rotor Vibration Setup Synchronization Control Broadcasting to noisy boosters: modulation of astrocytic calcium activity and gliotransmitter release by norepinephrine Testing approaches to statistical evaluation of connectivity estimates in epileptic brain based on simple oscillatory models Role of the Bidirectional Cardiorespiratory Coupling in the Nonlinear Dynamics of the Cardiovascular System CPG-based control of robotic fish by setting macro-commands with transient parameters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1