基于倒谱特征工程的多维谱处理与构象编码

{"title":"基于倒谱特征工程的多维谱处理与构象编码","authors":"","doi":"10.33140/jeee.01.01.01","DOIUrl":null,"url":null,"abstract":"The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.","PeriodicalId":39047,"journal":{"name":"Journal of Electrical and Electronics Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding\",\"authors\":\"\",\"doi\":\"10.33140/jeee.01.01.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.\",\"PeriodicalId\":39047,\"journal\":{\"name\":\"Journal of Electrical and Electronics Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electrical and Electronics Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33140/jeee.01.01.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical and Electronics Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33140/jeee.01.01.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

基频特征对自动语音识别至关重要,因为它的模式传达了一种副语言,它的调谐使其他语音特征规范化。人类语言是多维的,因为它最小程度上由三个变量表示:语调(或音高)、共振峰(或音色)和语音分辨率(或深度)。这些变量分别代表了局部声门变化、声道反应和频率尺度的隐藏状态。本文提出了一种新的语音特征提取方法。这篇文章是介绍性的;它侧重于我们的新方法的基本概念,并没有详细说明所有的应用。它证明了倒谱值的单位(谱的谱值)是加速度的单位,因为它的离散变量频率可以用赫兹/微秒表示。本文展示了如何从鲁棒估计生成精细的语音分析,以及如何从特征空间重建语音信号。结果表明,新方法的音高轨迹与两个开源的音高提取器一样好。结合多个过程,衰减背景噪声,实现远距离语音识别,我们介绍了语音频率变换(SQT)方法以及多个频率尺度。SQT是一组频率变换,其频谱泄漏由调频模型控制。SQT将时间序列的平稳性捕获到类似于倒图的超空间上,当它被减少用于音轨提取时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Dimensional Spectral Process for Cepstral Feature Engineering & Formant Coding
The fundamental frequency feature is essential for Automatic Speech Recognition because its patterns convey a paralanguage and its tuning normalizes other speech features. Human speech is multidimensional because it is minimally represented by three variables: the intonation (or pitch), the formants (or timbre), and the speech resolution (or depth). These variables represent the hidden states of the local glottal variation, the vocal tract response, and the frequency scale, respectively. Computing them one by one is not as efficient as computing them together, so this article introduces a new speech feature extraction approach. The article is introductory; it focuses on the basic concepts of our new approach and does not elaborate on all applications. It demonstrates that the unit of a cepstral value, which is a spectral value of spectrums, is a unit of acceleration since its discrete variable, the quefrency, can be expressed in Hertz-per-microsecond. The article shows how to produce refined voice analysis from robust estimates and how to reconstruct speech signals from feature spaces. And it concludes that the pitch track of the new approach is as good as two open-source pitch extractors. Combining multiple processes, attenuating background noises, and enabling distant-speech recognition, we introduce the Speech Quefrency Transform (SQT) approach as well as multiple quefrency scales. SQT is a set of frequency transforms whose spectral leakages are controlled per a frequency-modulation model. SQT captures the stationarity of time series onto a hyperspace that resembles the cepstrogram when it is reduced for pitch track extraction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Electrical and Electronics Engineering
Journal of Electrical and Electronics Engineering Engineering-Electrical and Electronic Engineering
CiteScore
0.90
自引率
0.00%
发文量
0
审稿时长
16 weeks
期刊介绍: Journal of Electrical and Electronics Engineering is a scientific interdisciplinary, application-oriented publication that offer to the researchers and to the PhD students the possibility to disseminate their novel and original scientific and research contributions in the field of electrical and electronics engineering. The articles are reviewed by professionals and the selection of the papers is based only on the quality of their content and following the next criteria: the papers presents the research results of the authors, the papers / the content of the papers have not been submitted or published elsewhere, the paper must be written in English, as well as the fact that the papers should include in the reference list papers already published in recent years in the Journal of Electrical and Electronics Engineering that present similar research results. The topics and instructions for authors of this journal can be found to the appropiate sections.
期刊最新文献
Optimal Operation of a Village Energy System Considering Renewable Resources and Battery Energy Storage Cascade Control Applied to a Single-Component Single-Stage Vaporizer—Modeling and Simulation Wireless Sensor Network Based Gas Monitoring System Utilizing ZigBee Technology Dynamic Modeling and Chaos Suppression of the Permanent Magnet Synchronous Motor Drive with Sliding Mode Control Modeling and Implementation of a Specific Microprocessor to Enhance the Performance of PLCs Employing FPGAs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1