{"title":"The impact of non-native English speakers’ phonological and prosodic features on automatic speech recognition accuracy","authors":"Ingy Farouk Emara , Nabil Hamdy Shaker","doi":"10.1016/j.specom.2024.103038","DOIUrl":null,"url":null,"abstract":"<div><p>The present study examines the impact of Arab speakers’ phonological and prosodic features on the accuracy of automatic speech recognition (ASR) of non-native English speech. The authors first investigated the perceptions of 30 Egyptian ESL teachers and 70 Egyptian university students towards the L1 (Arabic)-based errors affecting intelligibility and then carried out a data analysis of the ASR of the students’ English speech to find out whether the errors investigated resulted in intelligibility breakdowns in an ASR setting. In terms of the phonological features of non-native speech, the results showed that the teachers gave more weight to pronunciation features of accented speech that did not actually hinder recognition, that the students were mostly oblivious to the L2 errors they made and their impact on intelligibility, and that L2 errors which were not perceived as serious by both teachers and students had negative impacts on ASR accuracy levels. In regard to the prosodic features of non-native speech, it was found that lower speech rates resulted in more accurate speech recognition levels, higher speech intensity led to less deletion errors, and voice pitch did not seem to have any impact on ASR accuracy levels. The study, accordingly, recommends training ASR systems with more non-native data to increase their accuracy levels as well as paying more attention to remedying non-native speakers’ L1-based errors that are more likely to impact non-native automatic speech recognition.</p></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"157 ","pages":"Article 103038"},"PeriodicalIF":2.4000,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324000104","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The present study examines the impact of Arab speakers’ phonological and prosodic features on the accuracy of automatic speech recognition (ASR) of non-native English speech. The authors first investigated the perceptions of 30 Egyptian ESL teachers and 70 Egyptian university students towards the L1 (Arabic)-based errors affecting intelligibility and then carried out a data analysis of the ASR of the students’ English speech to find out whether the errors investigated resulted in intelligibility breakdowns in an ASR setting. In terms of the phonological features of non-native speech, the results showed that the teachers gave more weight to pronunciation features of accented speech that did not actually hinder recognition, that the students were mostly oblivious to the L2 errors they made and their impact on intelligibility, and that L2 errors which were not perceived as serious by both teachers and students had negative impacts on ASR accuracy levels. In regard to the prosodic features of non-native speech, it was found that lower speech rates resulted in more accurate speech recognition levels, higher speech intensity led to less deletion errors, and voice pitch did not seem to have any impact on ASR accuracy levels. The study, accordingly, recommends training ASR systems with more non-native data to increase their accuracy levels as well as paying more attention to remedying non-native speakers’ L1-based errors that are more likely to impact non-native automatic speech recognition.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.