机器能否区分语音中社交吱吱声的高低?

IF 2.5 4区 医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY Journal of Voice Pub Date : 2024-10-24 DOI:10.1016/j.jvoice.2024.09.050
Anne-Maria Laukkanen, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku
{"title":"机器能否区分语音中社交吱吱声的高低?","authors":"Anne-Maria Laukkanen, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku","doi":"10.1016/j.jvoice.2024.09.050","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Increased prevalence of social creak particularly among female speakers has been reported in several studies. The study of social creak has been previously conducted by combining perceptual evaluation of speech with conventional acoustical parameters such as the harmonic-to-noise ratio and cepstral peak prominence. In the current study, machine learning (ML) was used to automatically distinguish speech of low amount of social creak from speech of high amount of social creak.</p><p><strong>Methods: </strong>The amount of creak in continuous speech samples produced in Finnish by 90 female speakers was first perceptually assessed by two voice specialists. Based on their assessments, the speech samples were divided into two categories (low vs high amount of creak). Using the speech signals and their creak labels, seven different ML models were trained. Three spectral representations were used as feature for each model.</p><p><strong>Results: </strong>The results show that the best performance (accuracy of 71.1%) was obtained by the following two systems: an Adaboost classifier using the mel-spectrogram feature and a decision tree classifier using the mel-frequency cepstral coefficient feature.</p><p><strong>Conclusions: </strong>The study of social creak is becoming increasingly popular in sociolinguistic and vocological research. The conventional human perceptual assessment of the amount of creak is laborious and therefore ML technology could be used to assist researchers studying social creak. The classification systems reported in this study could be considered as baselines in future ML-based studies on social creak.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can a Machine Distinguish High and Low Amount of Social Creak in Speech?\",\"authors\":\"Anne-Maria Laukkanen, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku\",\"doi\":\"10.1016/j.jvoice.2024.09.050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Increased prevalence of social creak particularly among female speakers has been reported in several studies. The study of social creak has been previously conducted by combining perceptual evaluation of speech with conventional acoustical parameters such as the harmonic-to-noise ratio and cepstral peak prominence. In the current study, machine learning (ML) was used to automatically distinguish speech of low amount of social creak from speech of high amount of social creak.</p><p><strong>Methods: </strong>The amount of creak in continuous speech samples produced in Finnish by 90 female speakers was first perceptually assessed by two voice specialists. Based on their assessments, the speech samples were divided into two categories (low vs high amount of creak). Using the speech signals and their creak labels, seven different ML models were trained. Three spectral representations were used as feature for each model.</p><p><strong>Results: </strong>The results show that the best performance (accuracy of 71.1%) was obtained by the following two systems: an Adaboost classifier using the mel-spectrogram feature and a decision tree classifier using the mel-frequency cepstral coefficient feature.</p><p><strong>Conclusions: </strong>The study of social creak is becoming increasingly popular in sociolinguistic and vocological research. The conventional human perceptual assessment of the amount of creak is laborious and therefore ML technology could be used to assist researchers studying social creak. The classification systems reported in this study could be considered as baselines in future ML-based studies on social creak.</p>\",\"PeriodicalId\":49954,\"journal\":{\"name\":\"Journal of Voice\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Voice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jvoice.2024.09.050\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2024.09.050","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:一些研究报告称,社交嘎吱声的发生率越来越高,尤其是在女性说话者中。以前对社交嘎吱声的研究是通过将语音的感知评估与传统的声学参数(如谐噪比和倒频峰突出度)相结合来进行的。在本研究中,使用机器学习(ML)来自动区分低吱吱声和高吱吱声的语音:方法:首先由两位语音专家对 90 位女性发言人用芬兰语发表的连续语音样本中的吱吱声量进行感知评估。根据他们的评估结果,语音样本被分为两类(嘎吱声少与多)。使用语音信号及其嘎吱声标签,对七个不同的 ML 模型进行了训练。每个模型使用三个频谱表示作为特征:结果表明,以下两个系统的性能最佳(准确率为 71.1%):使用 mel 频谱特征的 Adaboost 分类器和使用 mel 频率倒频谱系数特征的决策树分类器:在社会语言学和职业学研究中,对社交吱吱作响的研究越来越受欢迎。对嘎吱声量进行传统的人类感知评估非常费力,因此可以使用 ML 技术来帮助研究人员研究社会嘎吱声。本研究中报告的分类系统可被视为未来基于 ML 的社交嘎吱声研究的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Can a Machine Distinguish High and Low Amount of Social Creak in Speech?

Objectives: Increased prevalence of social creak particularly among female speakers has been reported in several studies. The study of social creak has been previously conducted by combining perceptual evaluation of speech with conventional acoustical parameters such as the harmonic-to-noise ratio and cepstral peak prominence. In the current study, machine learning (ML) was used to automatically distinguish speech of low amount of social creak from speech of high amount of social creak.

Methods: The amount of creak in continuous speech samples produced in Finnish by 90 female speakers was first perceptually assessed by two voice specialists. Based on their assessments, the speech samples were divided into two categories (low vs high amount of creak). Using the speech signals and their creak labels, seven different ML models were trained. Three spectral representations were used as feature for each model.

Results: The results show that the best performance (accuracy of 71.1%) was obtained by the following two systems: an Adaboost classifier using the mel-spectrogram feature and a decision tree classifier using the mel-frequency cepstral coefficient feature.

Conclusions: The study of social creak is becoming increasingly popular in sociolinguistic and vocological research. The conventional human perceptual assessment of the amount of creak is laborious and therefore ML technology could be used to assist researchers studying social creak. The classification systems reported in this study could be considered as baselines in future ML-based studies on social creak.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Voice
Journal of Voice 医学-耳鼻喉科学
CiteScore
4.00
自引率
13.60%
发文量
395
审稿时长
59 days
期刊介绍: The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.
期刊最新文献
Does the Daily Practice of a Structured Voice Exercise Protocol Affect the Fitness Instructor's Self-Perceived Vocal Effort, Vocal Fatigue, and Voice Handicap? Vocal Effort in Clinical Settings of North and South American Countries: Characterization From Argentinian, Chilean, Colombian, and the United States Clinician's Reports. Anesthetic Techniques for Type-1 (Medialization) Thyroplasty: A Scoping Review. Associations Between Immunological Biomarkers, Voice Use Patterns, and Phonotraumatic Vocal Fold Lesions: A Scoping Review. Correlation Between Anxiety, Depression, and Self-Perceived Hoarseness: A Case Series of 100 Lebanese Patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1