Acoustic Stress Detection in Isolated English Words for Computer-Assisted Pronunciation Training

Vera Bernhard, Sandra Schwab, J. Goldman
{"title":"Acoustic Stress Detection in Isolated English Words for Computer-Assisted Pronunciation Training","authors":"Vera Bernhard, Sandra Schwab, J. Goldman","doi":"10.21437/interspeech.2022-197","DOIUrl":null,"url":null,"abstract":"We propose a system for automatic lexical stress detection in isolated English words. It is designed to be part of the computer-assisted pronunciation training application MIAPARLE (“https://miaparle.unige.ch”) that specifically focuses on stress contrasts acquisition. Training lexical stress cannot be disregarded in language education as the accuracy in production highly affects the intelligibility and perceived fluency of an L2 speaker. The pipeline automatically segments audio input into syllables over which duration, intensity, pitch, and spectral information is calculated. Since the stress of a syllable is defined relative to its neighboring syllables, the values obtained over the syllables are complemented with differential values to the preceding and following syllables. The resulting feature vectors, retrieved from 1011 recordings of single words spoken by English natives, are used to train a Voting Classifier composed of four supervised classifiers, namely a Support Vector Machine, a Neural Net, a K Nearest Neighbor, and a Random Forest classifier. The approach determines syllables of a single word as stressed or unstressed with an F1 score of 94% and an accuracy of 96%.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"3143-3147"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a system for automatic lexical stress detection in isolated English words. It is designed to be part of the computer-assisted pronunciation training application MIAPARLE (“https://miaparle.unige.ch”) that specifically focuses on stress contrasts acquisition. Training lexical stress cannot be disregarded in language education as the accuracy in production highly affects the intelligibility and perceived fluency of an L2 speaker. The pipeline automatically segments audio input into syllables over which duration, intensity, pitch, and spectral information is calculated. Since the stress of a syllable is defined relative to its neighboring syllables, the values obtained over the syllables are complemented with differential values to the preceding and following syllables. The resulting feature vectors, retrieved from 1011 recordings of single words spoken by English natives, are used to train a Voting Classifier composed of four supervised classifiers, namely a Support Vector Machine, a Neural Net, a K Nearest Neighbor, and a Random Forest classifier. The approach determines syllables of a single word as stressed or unstressed with an F1 score of 94% and an accuracy of 96%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
计算机辅助发音训练中孤立英语单词的声重音检测
我们提出了一个英语孤立词的自动词法重音检测系统。它被设计成计算机辅助发音训练应用程序MIAPARLE(“https://miaparle.unige.ch”)的一部分,特别侧重于重读对比习得。训练词汇重音在语言教育中不可忽视,因为词汇重音的准确性对二语说话者的可理解性和感知流畅性有很大影响。该管道自动将音频输入分割成音节,并计算其持续时间、强度、音高和频谱信息。由于一个音节的重音是相对于它的邻近音节来定义的,因此在音节上获得的值与前一个音节和后一个音节的值相辅相成。所得到的特征向量是从1011个英语母语者说的单个单词的录音中检索出来的,用于训练由四个监督分类器组成的投票分类器,即支持向量机、神经网络、K近邻和随机森林分类器。该方法确定单个单词的音节是重读还是非重读,F1得分为94%,准确率为96%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance. Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer VCSE: Time-Domain Visual-Contextual Speaker Extraction Network Induce Spoken Dialog Intents via Deep Unsupervised Context Contrastive Clustering Nasal Coda Loss in the Chengdu Dialect of Mandarin: Evidence from RT-MRI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1