{"title":"一种不受人声沙哑影响的鲁棒基频检测算法。","authors":"Itsuki Kitayama, Kiyohito Hosokawa, Shinobu Iwaki, Misao Yoshida, Akira Miyauchi, Toshihiro Kishikawa, Hidenori Tanaka, Takeshi Tsuda, Takashi Sato, Yukinori Takenaka, Makoto Ogawa, Hidenori Inohara","doi":"10.1121/10.0034624","DOIUrl":null,"url":null,"abstract":"<p><p>The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, \"Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS),\" which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"156 6","pages":"4217-4228"},"PeriodicalIF":2.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust fundamental frequency-detection algorithm unaffected by the presence of hoarseness in human voice.\",\"authors\":\"Itsuki Kitayama, Kiyohito Hosokawa, Shinobu Iwaki, Misao Yoshida, Akira Miyauchi, Toshihiro Kishikawa, Hidenori Tanaka, Takeshi Tsuda, Takashi Sato, Yukinori Takenaka, Makoto Ogawa, Hidenori Inohara\",\"doi\":\"10.1121/10.0034624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, \\\"Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS),\\\" which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.</p>\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"156 6\",\"pages\":\"4217-4228\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0034624\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0034624","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
Robust fundamental frequency-detection algorithm unaffected by the presence of hoarseness in human voice.
The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, "Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS)," which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.
期刊介绍:
Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.