A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis

2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA) Pub Date : 2022-09-14 DOI:10.1109/DCNA56428.2022.9923223

A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov

{"title":"A Novel Approach to Speech Signal Segmentation Based on Time-Frequency Analysis","authors":"A. Alimuradov, A. Tychkov, P. Churakov, D. S. Dudnikov","doi":"10.1109/DCNA56428.2022.9923223","DOIUrl":null,"url":null,"abstract":"The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.","PeriodicalId":110836,"journal":{"name":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCNA56428.2022.9923223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于时频分析的语音信号分割新方法

语音信号分割的准确性直接取决于用于确定连续语音流中信息片段的开始和结束边界的参数。通过对语音信号进行频率-时间分析，提高语音/暂停分割的效率。提出了一种基于平均频率(频域)和Teager算子短时能量(时域)分析的语音/暂停分割新方法。该方法的独特之处在于，它基于连续语音流形成过程中呼吸器官的生理功能开发了一种辅助算法来纠正语音/暂停分割错误。简要概述了用于语音/暂停分割的语音信号信息参数，并详细介绍了所提出的方法的性能。所提出的方法已与已知的纯和噪声语音信号的语音/暂停分割方法进行了比较。研究结果表明，基于该方法的语音/暂停分割方法在纯语音和含噪语音信号中均取得了较好的分割效果;Teager算子函数的短期能量与平均频率的比值作为信息参数，确保与分割问题的最大相关性;一种校正假态的辅助算法，提高分割效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA)

自引率

0.00%

发文量