Acoustic Stress Detection in Isolated English Words for Computer-Assisted Pronunciation Training

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-197

Vera Bernhard, Sandra Schwab, J. Goldman

{"title":"Acoustic Stress Detection in Isolated English Words for Computer-Assisted Pronunciation Training","authors":"Vera Bernhard, Sandra Schwab, J. Goldman","doi":"10.21437/interspeech.2022-197","DOIUrl":null,"url":null,"abstract":"We propose a system for automatic lexical stress detection in isolated English words. It is designed to be part of the computer-assisted pronunciation training application MIAPARLE (“https://miaparle.unige.ch”) that specifically focuses on stress contrasts acquisition. Training lexical stress cannot be disregarded in language education as the accuracy in production highly affects the intelligibility and perceived fluency of an L2 speaker. The pipeline automatically segments audio input into syllables over which duration, intensity, pitch, and spectral information is calculated. Since the stress of a syllable is defined relative to its neighboring syllables, the values obtained over the syllables are complemented with differential values to the preceding and following syllables. The resulting feature vectors, retrieved from 1011 recordings of single words spoken by English natives, are used to train a Voting Classifier composed of four supervised classifiers, namely a Support Vector Machine, a Neural Net, a K Nearest Neighbor, and a Random Forest classifier. The approach determines syllables of a single word as stressed or unstressed with an F1 score of 94% and an accuracy of 96%.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"3143-3147"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a system for automatic lexical stress detection in isolated English words. It is designed to be part of the computer-assisted pronunciation training application MIAPARLE (“https://miaparle.unige.ch”) that specifically focuses on stress contrasts acquisition. Training lexical stress cannot be disregarded in language education as the accuracy in production highly affects the intelligibility and perceived fluency of an L2 speaker. The pipeline automatically segments audio input into syllables over which duration, intensity, pitch, and spectral information is calculated. Since the stress of a syllable is defined relative to its neighboring syllables, the values obtained over the syllables are complemented with differential values to the preceding and following syllables. The resulting feature vectors, retrieved from 1011 recordings of single words spoken by English natives, are used to train a Voting Classifier composed of four supervised classifiers, namely a Support Vector Machine, a Neural Net, a K Nearest Neighbor, and a Random Forest classifier. The approach determines syllables of a single word as stressed or unstressed with an F1 score of 94% and an accuracy of 96%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

计算机辅助发音训练中孤立英语单词的声重音检测

我们提出了一个英语孤立词的自动词法重音检测系统。它被设计成计算机辅助发音训练应用程序MIAPARLE(“https://miaparle.unige.ch”)的一部分，特别侧重于重读对比习得。训练词汇重音在语言教育中不可忽视，因为词汇重音的准确性对二语说话者的可理解性和感知流畅性有很大影响。该管道自动将音频输入分割成音节，并计算其持续时间、强度、音高和频谱信息。由于一个音节的重音是相对于它的邻近音节来定义的，因此在音节上获得的值与前一个音节和后一个音节的值相辅相成。所得到的特征向量是从1011个英语母语者说的单个单词的录音中检索出来的，用于训练由四个监督分类器组成的投票分类器，即支持向量机、神经网络、K近邻和随机森林分类器。该方法确定单个单词的音节是重读还是非重读，F1得分为94%，准确率为96%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Interspeech

自引率

0.00%

发文量