Learnability of English diphthongs: One dynamic target vs. two static targets

IF 3 3区 计算机科学 Q2 ACOUSTICS Speech Communication Pub Date : 2025-05-01 Epub Date: 2025-03-05 DOI:10.1016/j.specom.2025.103225
Anqi Xu , Daniel R. van Niekerk , Branislav Gerazov , Paul Konstantin Krug , Santitham Prom-on , Peter Birkholz , Yi Xu
{"title":"Learnability of English diphthongs: One dynamic target vs. two static targets","authors":"Anqi Xu ,&nbsp;Daniel R. van Niekerk ,&nbsp;Branislav Gerazov ,&nbsp;Paul Konstantin Krug ,&nbsp;Santitham Prom-on ,&nbsp;Peter Birkholz ,&nbsp;Yi Xu","doi":"10.1016/j.specom.2025.103225","DOIUrl":null,"url":null,"abstract":"<div><div>As vowels with intrinsic movements, diphthongs are among the most elusive sounds of speech. Previous research has characterized diphthongs as a combination of two vowels, a vowel followed by a formant transition, or a constant rate of formant change. These accounts are based on acoustic patterns, perceptual cues, and either acoustic or articulatory synthesis, but no consensus has been reached. In this study, we explore the nature of diphthongs by exploring how they can be acquired through vocal learning. The acquisition is simulated by a three-dimensional (3D) vocal tract model with built-in target approximation dynamics, which can learn articulatory targets of phonetic categories under the guidance of a speech recognizer. The simulation attempts to learn to articulate diphthong-embedded monosyllabic English words with either a single dynamic target or two static targets, and the learned synthetic words were presented to native listeners for identification. The results showed that diphthongs learned with dynamic targets were consistently more intelligible across variable durations than those learned with two static targets, with only the exception of /aɪ/. From the perspective of learnability, therefore, English diphthongs are likely unitary vowels with dynamic targets.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"170 ","pages":"Article 103225"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000408","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/5 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

As vowels with intrinsic movements, diphthongs are among the most elusive sounds of speech. Previous research has characterized diphthongs as a combination of two vowels, a vowel followed by a formant transition, or a constant rate of formant change. These accounts are based on acoustic patterns, perceptual cues, and either acoustic or articulatory synthesis, but no consensus has been reached. In this study, we explore the nature of diphthongs by exploring how they can be acquired through vocal learning. The acquisition is simulated by a three-dimensional (3D) vocal tract model with built-in target approximation dynamics, which can learn articulatory targets of phonetic categories under the guidance of a speech recognizer. The simulation attempts to learn to articulate diphthong-embedded monosyllabic English words with either a single dynamic target or two static targets, and the learned synthetic words were presented to native listeners for identification. The results showed that diphthongs learned with dynamic targets were consistently more intelligible across variable durations than those learned with two static targets, with only the exception of /aɪ/. From the perspective of learnability, therefore, English diphthongs are likely unitary vowels with dynamic targets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
英语双元音的易学性:一个动态目标vs.两个静态目标
作为具有内在运动的元音,双元音是最难以捉摸的语音之一。以前的研究将双元音描述为两个元音的组合,一个元音之后是一个形成峰的过渡,或者形成峰的恒定速率变化。这些说法是基于声学模式、感知线索以及声学或发音合成,但尚未达成共识。在这项研究中,我们通过探索如何通过声乐学习获得双元音来探索双元音的本质。通过内置目标逼近动力学的三维声道模型进行模拟,在语音识别器的引导下学习语音类别的发音目标。该模拟尝试通过单个动态目标或两个静态目标来学习发音包含双元音的单音节英语单词,并将学习到的合成单词呈现给母语听众进行识别。结果表明,除了/a / /外,在不同的持续时间内,用动态目标学习的双元音比用两个静态目标学习的双元音更容易理解。因此,从易学性的角度来看,英语双元音很可能是带有动态目标的单一元音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
期刊最新文献
DDNet: A task-decoupled two-stage network for multi-channel speech denoising and dereverberation Diagnosis-aware multitask fine-tuning of Whisper for dysarthric speech recognition Effects of time pressure and regional background on the peak alignment and scaling of nuclear rises in national Standard German varieties A study on the layer-wise transferability of self-supervised learning features for children’s speech processing tasks MaTSE: A hybrid Mamba-Transformer model for monaural Speech Enhancement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1