一种不受人声沙哑影响的鲁棒基频检测算法。

IF 2.1 2区 物理与天体物理 Q2 ACOUSTICS Journal of the Acoustical Society of America Pub Date : 2024-12-01 DOI:10.1121/10.0034624
Itsuki Kitayama, Kiyohito Hosokawa, Shinobu Iwaki, Misao Yoshida, Akira Miyauchi, Toshihiro Kishikawa, Hidenori Tanaka, Takeshi Tsuda, Takashi Sato, Yukinori Takenaka, Makoto Ogawa, Hidenori Inohara
{"title":"一种不受人声沙哑影响的鲁棒基频检测算法。","authors":"Itsuki Kitayama, Kiyohito Hosokawa, Shinobu Iwaki, Misao Yoshida, Akira Miyauchi, Toshihiro Kishikawa, Hidenori Tanaka, Takeshi Tsuda, Takashi Sato, Yukinori Takenaka, Makoto Ogawa, Hidenori Inohara","doi":"10.1121/10.0034624","DOIUrl":null,"url":null,"abstract":"<p><p>The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, \"Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS),\" which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"156 6","pages":"4217-4228"},"PeriodicalIF":2.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust fundamental frequency-detection algorithm unaffected by the presence of hoarseness in human voice.\",\"authors\":\"Itsuki Kitayama, Kiyohito Hosokawa, Shinobu Iwaki, Misao Yoshida, Akira Miyauchi, Toshihiro Kishikawa, Hidenori Tanaka, Takeshi Tsuda, Takashi Sato, Yukinori Takenaka, Makoto Ogawa, Hidenori Inohara\",\"doi\":\"10.1121/10.0034624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, \\\"Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS),\\\" which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.</p>\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"156 6\",\"pages\":\"4217-4228\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0034624\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0034624","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

基频(fo)是量化声部特征的关键。然而,在沙哑的声音中估计的精度很低,并且没有明确的算法来估计。在本研究中,我们引入了一种名为“基于频谱的以支配和序列为重点的估计器(SFEEDS)”的算法,对频谱法进行了改进,并与传统估计方法进行了比较分析。我们分析了454个语音样本,并使用常规方法和sfeed来计算。fo的基真值被确定为在频谱图上观察到的最主要谐波复内的最低频率。随后,我们评估了每种估计方法与地面真值之间的一致性。我们还研究了这些方法在分析声音嘶哑时准确性的变化。无论声音是否嘶哑,sfeed的估计精度都明显高于传统方法。此外,传统的方法会降低粗糙样本的估计精度,而SFEEDS算法具有鲁棒性,并显著降低了次谐波误差。SFEEDS估计算法可以准确地估计正常和沙哑声音的频率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Robust fundamental frequency-detection algorithm unaffected by the presence of hoarseness in human voice.

The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, "Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS)," which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.60
自引率
16.70%
发文量
1433
审稿时长
4.7 months
期刊介绍: Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.
期刊最新文献
A hybrid design based on alternating layered fluids for the cloaking of elastic cylinders. A stochastic and microscopic model to predict road traffic noise by random generation of single vehicles' speeds. Efficient and accurate feature-aided active tracking for underwater small targets in highly cluttered harbor environments using a full motion acoustic flow field solution. Enhancing feature-aided data association tracking in passive sonar arrays: An advanced Siamese network approach. Exploring the directivities of whistle in the Indo-Pacific humpback dolphin (Sousa chinensis) and their dependency on the whistles' frequency contour.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1