Assessment of L2 intelligibility: Comparing L1 listeners and automatic speech recognition

IF 5.7 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH Recall Pub Date : 2022-11-18 DOI:10.1017/S0958344022000192

Solène Inceoglu, Wen-Hsin Chen, Hyojung Lim

{"title":"Assessment of L2 intelligibility: Comparing L1 listeners and automatic speech recognition","authors":"Solène Inceoglu, Wen-Hsin Chen, Hyojung Lim","doi":"10.1017/S0958344022000192","DOIUrl":null,"url":null,"abstract":"Abstract An increasing number of studies are exploring the benefits of automatic speech recognition (ASR)–based dictation programs for second language (L2) pronunciation learning (e.g. Chen, Inceoglu & Lim, 2020; Liakin, Cardoso & Liakina, 2015; McCrocklin, 2019), but how ASR recognizes accented speech and the nature of the feedback it provides to language learners is still largely under-researched. The current study explores whether the intelligibility of L2 speakers differs when assessed by native (L1) listeners versus ASR technology, and reports on the types of intelligibility issues encountered by the two groups. Twelve L1 listeners of English transcribed 48 isolated words targeting the /ɪ-i/ and /æ-ε/ contrasts and 24 short sentences that four Taiwanese intermediate learners of English had produced using Google’s ASR dictation system. Overall, the results revealed lower intelligibility scores for the word task (ASR: 40.81%, L1 listeners: 38.62%) than the sentence task (ASR: 75.52%, L1 listeners: 83.88%), and highlighted strong similarities in the error types – and their proportions – identified by ASR and the L1 listeners. However, despite similar recognition scores, correlations indicated that the ASR recognition of the L2 speakers’ oral productions mirrored the L1 listeners’ judgments of intelligibility in the word and sentence tasks for only one speaker, with significant positive correlations for one additional speaker in each task. This suggests that the extent to which ASR approaches L1 listeners at recognizing accented speech may depend on individual speakers and the type of oral speech.","PeriodicalId":47046,"journal":{"name":"Recall","volume":"35 1","pages":"89 - 104"},"PeriodicalIF":5.7000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recall","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/S0958344022000192","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract An increasing number of studies are exploring the benefits of automatic speech recognition (ASR)–based dictation programs for second language (L2) pronunciation learning (e.g. Chen, Inceoglu & Lim, 2020; Liakin, Cardoso & Liakina, 2015; McCrocklin, 2019), but how ASR recognizes accented speech and the nature of the feedback it provides to language learners is still largely under-researched. The current study explores whether the intelligibility of L2 speakers differs when assessed by native (L1) listeners versus ASR technology, and reports on the types of intelligibility issues encountered by the two groups. Twelve L1 listeners of English transcribed 48 isolated words targeting the /ɪ-i/ and /æ-ε/ contrasts and 24 short sentences that four Taiwanese intermediate learners of English had produced using Google’s ASR dictation system. Overall, the results revealed lower intelligibility scores for the word task (ASR: 40.81%, L1 listeners: 38.62%) than the sentence task (ASR: 75.52%, L1 listeners: 83.88%), and highlighted strong similarities in the error types – and their proportions – identified by ASR and the L1 listeners. However, despite similar recognition scores, correlations indicated that the ASR recognition of the L2 speakers’ oral productions mirrored the L1 listeners’ judgments of intelligibility in the word and sentence tasks for only one speaker, with significant positive correlations for one additional speaker in each task. This suggests that the extent to which ASR approaches L1 listeners at recognizing accented speech may depend on individual speakers and the type of oral speech.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

第二语言可理解性评估:比较母语听众和自动语音识别

越来越多的研究正在探索基于自动语音识别(ASR)的听写程序对第二语言(L2)发音学习的好处(例如Chen, Inceoglu & Lim, 2020;Liakin, Cardoso & Liakina, 2015;mcrocklin, 2019)，但ASR如何识别有口音的语音，以及它向语言学习者提供的反馈的性质，在很大程度上仍未得到充分研究。目前的研究探讨了母语(L1)听者与ASR技术评估的L2说话者的可理解性是否不同，并报告了两组人遇到的可理解性问题的类型。12名L1英语听写者使用谷歌的ASR听写系统，转录了4名台湾中级英语学习者所写的48个针对/ / i/和/æ-ε/对比的孤立单词和24个短句。总体而言，结果显示单词任务(ASR: 40.81%，母语听者:38.62%)的可理解性得分低于句子任务(ASR: 75.52%，母语听者:83.88%)，并且突出了ASR和母语听者识别的错误类型及其比例的高度相似性。然而，尽管识别得分相似，相关性表明，第二语言说话者口头作品的ASR识别反映了第一语言听者在单词和句子任务中对一个说话者的可理解性判断，每个任务中都有一个额外的说话者显著的正相关。这表明，ASR在多大程度上接近母语听者识别重音语音，可能取决于说话者个人和口语类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Recall Multiple-

CiteScore

8.50

自引率

4.40%

发文量