Self-supervised Speech Models for Word-Level Stuttered Speech Detection

Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath
{"title":"Self-supervised Speech Models for Word-Level Stuttered Speech Detection","authors":"Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath","doi":"arxiv-2409.10704","DOIUrl":null,"url":null,"abstract":"Clinical diagnosis of stuttering requires an assessment by a licensed\nspeech-language pathologist. However, this process is time-consuming and\nrequires clinicians with training and experience in stuttering and fluency\ndisorders. Unfortunately, only a small percentage of speech-language\npathologists report being comfortable working with individuals who stutter,\nwhich is inadequate to accommodate for the 80 million individuals who stutter\nworldwide. Developing machine learning models for detecting stuttered speech\nwould enable universal and automated screening for stuttering, enabling speech\npathologists to identify and follow up with patients who are most likely to be\ndiagnosed with a stuttering speech disorder. Previous research in this area has\npredominantly focused on utterance-level detection, which is not sufficient for\nclinical settings where word-level annotation of stuttering is the norm. In\nthis study, we curated a stuttered speech dataset with word-level annotations\nand introduced a word-level stuttering speech detection model leveraging\nself-supervised speech models. Our evaluation demonstrates that our model\nsurpasses previous approaches in word-level stuttering speech detection.\nAdditionally, we conducted an extensive ablation analysis of our method,\nproviding insight into the most important aspects of adapting self-supervised\nspeech models for stuttered speech detection.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders. Unfortunately, only a small percentage of speech-language pathologists report being comfortable working with individuals who stutter, which is inadequate to accommodate for the 80 million individuals who stutter worldwide. Developing machine learning models for detecting stuttered speech would enable universal and automated screening for stuttering, enabling speech pathologists to identify and follow up with patients who are most likely to be diagnosed with a stuttering speech disorder. Previous research in this area has predominantly focused on utterance-level detection, which is not sufficient for clinical settings where word-level annotation of stuttering is the norm. In this study, we curated a stuttered speech dataset with word-level annotations and introduced a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection. Additionally, we conducted an extensive ablation analysis of our method, providing insight into the most important aspects of adapting self-supervised speech models for stuttered speech detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于词级口吃语音检测的自监督语音模型
口吃的临床诊断需要有执照的语言病理学家进行评估。然而,这一过程非常耗时,而且需要临床医生接受过口吃和流利性障碍方面的培训并具有相关经验。遗憾的是,只有一小部分言语病理学家表示自己能够胜任口吃患者的工作,这不足以满足全球 8000 万口吃患者的需求。开发用于检测口吃语音的机器学习模型可以实现口吃的普遍和自动筛查,使言语病理学家能够识别和跟踪最有可能被诊断为口吃性言语障碍的患者。以前在这一领域的研究主要集中在语篇级检测上,这对于临床环境来说是不够的,因为在临床环境中,口吃的词级注释是常态。在这项研究中,我们整理了一个带有词级注释的口吃语音数据集,并利用自我监督语音模型引入了词级口吃语音检测模型。我们的评估结果表明,我们的模型在词级口吃语音检测方面超越了以往的方法。此外,我们还对我们的方法进行了广泛的消融分析,为口吃语音检测中调整自监督语音模型的最重要方面提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring an Inter-Pausal Unit (IPU) based Approach for Indic End-to-End TTS Systems Conformal Prediction for Manifold-based Source Localization with Gaussian Processes Insights into the Incorporation of Signal Information in Binaural Signal Matching with Wearable Microphone Arrays Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1