基于语音信号两级自回归模型的声源异步分析方法

Izmeritel`naya Tekhnika Pub Date : 2024-04-05 DOI:10.32446/0368-1025it.2024-2-55-62

V. V. Savchenko, L. V. Savchenko

{"title":"基于语音信号两级自回归模型的声源异步分析方法","authors":"V. V. Savchenko, L. V. Savchenko","doi":"10.32446/0368-1025it.2024-2-55-62","DOIUrl":null,"url":null,"abstract":"The task of analyzing a glottal source over a short observation interval is considered. The acute problem of insufficient performance of known methods for analyzing a glottal source is pointed out, regardless of the mode of data preparation: synchronous with the main tone of speech sounds or asynchronous. A method for analyzing the glottal source based on a two-level autoregressive model of the speech signal is proposed. Its software implementation based on the high-speed Burg-Levinson computational procedure is described. It does not require synchronization of the sequence of observations used with the main tone of the speech signal and is characterized by a relatively small amount of computational costs. Using the described software implementation, a full-scale experiment was set up and conducted, where the vowel sounds of the control speaker’s speech were used as the object of study. Based on the results of the experiment, the increased performance of the proposed method was confirmed and its requirements for the duration of the speech signal during voice analysis in real time were formulated. It is shown that the optimal duration is in the range from 32 to 128 ms. The results obtained can be used in the development and research of digital speech communication systems, voice control, biometrics, biomedicine and other speech systems where the voice characteristics of the speaker’s speech are of paramount importance.","PeriodicalId":14651,"journal":{"name":"Izmeritel`naya Tekhnika","volume":"195 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Method for asynchronous analysis of a glottal source based on a two-level autoregressive model of the speech signal\",\"authors\":\"V. V. Savchenko, L. V. Savchenko\",\"doi\":\"10.32446/0368-1025it.2024-2-55-62\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of analyzing a glottal source over a short observation interval is considered. The acute problem of insufficient performance of known methods for analyzing a glottal source is pointed out, regardless of the mode of data preparation: synchronous with the main tone of speech sounds or asynchronous. A method for analyzing the glottal source based on a two-level autoregressive model of the speech signal is proposed. Its software implementation based on the high-speed Burg-Levinson computational procedure is described. It does not require synchronization of the sequence of observations used with the main tone of the speech signal and is characterized by a relatively small amount of computational costs. Using the described software implementation, a full-scale experiment was set up and conducted, where the vowel sounds of the control speaker’s speech were used as the object of study. Based on the results of the experiment, the increased performance of the proposed method was confirmed and its requirements for the duration of the speech signal during voice analysis in real time were formulated. It is shown that the optimal duration is in the range from 32 to 128 ms. The results obtained can be used in the development and research of digital speech communication systems, voice control, biometrics, biomedicine and other speech systems where the voice characteristics of the speaker’s speech are of paramount importance.\",\"PeriodicalId\":14651,\"journal\":{\"name\":\"Izmeritel`naya Tekhnika\",\"volume\":\"195 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Izmeritel`naya Tekhnika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32446/0368-1025it.2024-2-55-62\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Izmeritel`naya Tekhnika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32446/0368-1025it.2024-2-55-62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

研究考虑了在较短观察间隔内分析声源的任务。研究指出，无论采用何种数据准备模式：与语音主音同步或非同步，已知的喉音源分析方法都存在性能不足的严重问题。本文提出了一种基于语音信号两级自回归模型的声源分析方法。介绍了基于高速 Burg-Levinson 计算程序的软件实现方法。该方法无需将观测序列与语音信号的主音同步，而且计算成本相对较低。利用所描述的软件实现，建立并进行了一次全面的实验，将对照组说话者语音中的元音作为研究对象。根据实验结果，确认了所提方法性能的提高，并制定了实时语音分析过程中对语音信号持续时间的要求。结果表明，最佳持续时间范围为 32 至 128 毫秒。所获得的结果可用于数字语音通信系统、语音控制、生物识别、生物医学和其他语音系统的开发和研究，在这些系统中，说话者的语音特征至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Method for asynchronous analysis of a glottal source based on a two-level autoregressive model of the speech signal

The task of analyzing a glottal source over a short observation interval is considered. The acute problem of insufficient performance of known methods for analyzing a glottal source is pointed out, regardless of the mode of data preparation: synchronous with the main tone of speech sounds or asynchronous. A method for analyzing the glottal source based on a two-level autoregressive model of the speech signal is proposed. Its software implementation based on the high-speed Burg-Levinson computational procedure is described. It does not require synchronization of the sequence of observations used with the main tone of the speech signal and is characterized by a relatively small amount of computational costs. Using the described software implementation, a full-scale experiment was set up and conducted, where the vowel sounds of the control speaker’s speech were used as the object of study. Based on the results of the experiment, the increased performance of the proposed method was confirmed and its requirements for the duration of the speech signal during voice analysis in real time were formulated. It is shown that the optimal duration is in the range from 32 to 128 ms. The results obtained can be used in the development and research of digital speech communication systems, voice control, biometrics, biomedicine and other speech systems where the voice characteristics of the speaker’s speech are of paramount importance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Izmeritel`naya Tekhnika

自引率

0.00%

发文量