Automatic Disfluency Detection From Untranscribed Speech

IF 4.1 2区 计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-10-23 DOI:10.1109/TASLP.2024.3485465
Amrit Romana;Kazuhito Koishida;Emily Mower Provost
{"title":"Automatic Disfluency Detection From Untranscribed Speech","authors":"Amrit Romana;Kazuhito Koishida;Emily Mower Provost","doi":"10.1109/TASLP.2024.3485465","DOIUrl":null,"url":null,"abstract":"Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. All speakers experience disfluencies at times, and the rate at which we produce disfluencies may be increased by certain speaker or environmental characteristics. Modeling disfluencies has been shown to be useful for a range of downstream tasks, and as a result, disfluency detection has many potential applications. In this work, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to transcribe disfluencies, measured using disfluency error rates. We then use these ASR transcripts as input to a language-based disfluency detection model. We find that disfluency detection performance is largely limited by the quality of transcripts and alignments. We find that an acoustic-based approach that does not require transcription as an intermediate step outperforms the ASR language approach. Finally, we present multimodal architectures which we find improve disfluency detection performance over the unimodal approaches. Ultimately, this work introduces novel approaches for automatic frame-level disfluency and categorization. In the long term, this will help researchers incorporate automatic disfluency detection into a range of applications.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4727-4740"},"PeriodicalIF":4.1000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10731569/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. All speakers experience disfluencies at times, and the rate at which we produce disfluencies may be increased by certain speaker or environmental characteristics. Modeling disfluencies has been shown to be useful for a range of downstream tasks, and as a result, disfluency detection has many potential applications. In this work, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to transcribe disfluencies, measured using disfluency error rates. We then use these ASR transcripts as input to a language-based disfluency detection model. We find that disfluency detection performance is largely limited by the quality of transcripts and alignments. We find that an acoustic-based approach that does not require transcription as an intermediate step outperforms the ASR language approach. Finally, we present multimodal architectures which we find improve disfluency detection performance over the unimodal approaches. Ultimately, this work introduces novel approaches for automatic frame-level disfluency and categorization. In the long term, this will help researchers incorporate automatic disfluency detection into a range of applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从未记录的语音中自动检测不流畅语句
说话不流畅,如停顿或重复,是典型语流的中断。所有说话者有时都会出现语无伦次的情况,某些说话者或环境特征可能会增加我们产生语无伦次的速度。对不连贯现象进行建模已被证明对一系列下游任务有用,因此,不连贯现象检测有许多潜在的应用。在这项工作中,我们研究了语言、声学和多模态方法,用于帧级不流畅语自动检测和分类。每种方法都依赖音频作为输入。首先,我们评估了几种自动语音识别(ASR)系统转录不流畅语句的能力,衡量标准是不流畅语句错误率。然后,我们将这些 ASR 转录结果作为基于语言的不流畅检测模型的输入。我们发现,不流畅语检测性能在很大程度上受到转录本和对齐质量的限制。我们发现,无需转录作为中间步骤的声学方法优于 ASR 语言方法。最后,我们提出了多模态架构,发现这种架构比单模态方法更能提高不流利检测性能。最终,这项工作为自动帧级不流畅和分类引入了新方法。从长远来看,这将有助于研究人员将不流畅自动检测纳入一系列应用中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
期刊最新文献
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1