首页 > 最新文献

Journal of Phonetics最新文献

英文 中文
Imitation of F0 tone contours by Mandarin and English speakers is both categorical and continuous 普通话和英语使用者对F0音调轮廓的模仿既有绝对的,也有连续的
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101457
Wei Zhang , Meghan Clayards , Morgan Sonderegger
Native speakers imitate F0 contours that vary between two lexical tones non-linearly–they do not precisely reproduce the presented F0 features but instead cluster them toward tonal categories, the so-called contrast mediation effect. However, less is known whether non-native speakers who lack the lexical tone phonology will show linear imitation of F0 contours. Addressing this question will deepen our understanding of whether F0 imitation is solely influenced by lexical tone contrasts or also shaped by other sources of non-linearity beyond phonological contrasts. To investigate this, the current study examined the categorization and imitation of a Mandarin flat-falling tonal continuum by both Mandarin speakers and English speakers who were naïve to tonal languages. Imitation distributions were analyzed by comparing two models: a linear regression model, which assumes participants linearly track phonetic cues, and a mixture regression model, which assumes imitation reflects underlying categories. The mixture regression model fit the data better for the Mandarin speakers while the reverse was true for the English speakers, suggesting that Mandarin speakers imitated the F0 contours more categorically than English speakers. However, for both groups, the data was best fit using a weighted combination of both models. For the Mandarin group this result along with additional analyses of duration, F1 and intensity suggest that tone categories involve both phonological and phonetic information and imitation taps both, possibly via hyper- and hypo-articulation. For English participants, the evidence for categorical mediation suggests that imitation is mediated by factors other than lexically contrastive linguistic categories, although the exact nature of the factors is unclear.
母语人士模仿的F0轮廓在两个词汇音调之间呈非线性变化——他们并没有精确地再现所呈现的F0特征,而是将它们聚集到音调类别中,即所谓的对比中介效应。然而,缺乏词汇音系学的非母语人士是否会对F0轮廓表现出线性模仿,这一点尚不清楚。解决这个问题将加深我们对F0模仿是否仅仅受到词汇语调对比的影响,还是也受到语音对比以外的其他非线性来源的影响的理解。为了研究这一点,本研究调查了普通话使用者和英语使用者对声调语言naïve的普通话降调连续体的分类和模仿。通过比较两种模型来分析模仿分布:线性回归模型(假设参与者线性跟踪语音线索)和混合回归模型(假设模仿反映潜在类别)。混合回归模型对说普通话的人更符合数据,而说英语的人则相反,这表明说普通话的人比说英语的人更明确地模仿F0轮廓。然而,对于两组来说,使用两种模型的加权组合来拟合数据是最好的。对于普通话组,这一结果以及对持续时间、F1和强度的额外分析表明,声调类别涉及语音和语音信息,模仿可能通过高发音和低发音来实现。对于英语参与者来说,范畴中介的证据表明,模仿是由词汇对比语言类别以外的因素介导的,尽管这些因素的确切性质尚不清楚。
{"title":"Imitation of F0 tone contours by Mandarin and English speakers is both categorical and continuous","authors":"Wei Zhang ,&nbsp;Meghan Clayards ,&nbsp;Morgan Sonderegger","doi":"10.1016/j.wocn.2025.101457","DOIUrl":"10.1016/j.wocn.2025.101457","url":null,"abstract":"<div><div>Native speakers imitate F0 contours that vary between two lexical tones non-linearly–they do not precisely reproduce the presented F0 features but instead cluster them toward tonal categories, the so-called contrast mediation effect. However, less is known whether non-native speakers who lack the lexical tone phonology will show linear imitation of F0 contours. Addressing this question will deepen our understanding of whether F0 imitation is solely influenced by lexical tone contrasts or also shaped by other sources of non-linearity beyond phonological contrasts. To investigate this, the current study examined the categorization and imitation of a Mandarin flat-falling tonal continuum by both Mandarin speakers and English speakers who were naïve to tonal languages. Imitation distributions were analyzed by comparing two models: a linear regression model, which assumes participants linearly track phonetic cues, and a mixture regression model, which assumes imitation reflects underlying categories. The mixture regression model fit the data better for the Mandarin speakers while the reverse was true for the English speakers, suggesting that Mandarin speakers imitated the F0 contours more categorically than English speakers. However, for both groups, the data was best fit using a weighted combination of both models. For the Mandarin group this result along with additional analyses of duration, F1 and intensity suggest that tone categories involve both phonological and phonetic information and imitation taps both, possibly via hyper- and hypo-articulation. For English participants, the evidence for categorical mediation suggests that imitation is mediated by factors other than lexically contrastive linguistic categories, although the exact nature of the factors is unclear.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101457"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F0 derivatives in the classification of meaningful tonal movements F0阶导数在有意义的调性动作分类中
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101454
Constantijn Kaland
Recent work applied cluster analysis on f0 contours in order to find ‘prototypical’ or ‘underlying’ categories as assumed in intonational phonology. However, it remains to be answered to what extent meaningful f0 variation can indeed be captured using automatic classification of surface realizations. Studies on f0 dynamics have suggested that derivatives (e.g., f0 velocity, acceleration and jerk) closely approximate the meaningful components of f0. The question answered in this study is to what extent f0 derivatives are more informative for cluster analysis than other metrics, such as the (time series) f0 contour they are derived from, a static measure representing it, or other acoustic measures such as intensity and duration. This is tested across two clustering techniques (hierarchical and k-medoids) for three different meaningful features expressed in Dutch noun phrases (of the type ‘blue sofa’): focus type (broad, narrow), focus position (adjective, noun) and phrase position (medial, final). Results show that derivatives are among the most informative acoustic measures, although the best performing cluster analyses are the ones based on multiple acoustic measures. Crucially, cluster analyses reveal that the different meaningful prosodic features each have their own characteristics in terms of acoustics and number of clusters.
最近的工作应用聚类分析的40轮廓,以找到“原型”或“潜在的”类别,假设在语调音系。然而,在多大程度上使用表面实现的自动分类确实可以捕获有意义的f0变化仍然有待回答。对f0动力学的研究表明,f0的导数(例如,f0的速度、加速度和加速度)与f0的有意义分量非常接近。本研究回答的问题是,对于聚类分析,f0导数在多大程度上比其他指标(如它们派生的(时间序列)f0轮廓,代表它的静态测量或其他声学测量(如强度和持续时间)提供更多信息。这是通过两种聚类技术(分层聚类和k-medoids聚类)对荷兰语名词短语(“蓝色沙发”类型)中表达的三种不同意义特征进行的测试:焦点类型(宽、窄)、焦点位置(形容词、名词)和短语位置(中间、结尾)。结果表明,尽管性能最好的聚类分析是基于多个声学测量的聚类分析,但导数是信息量最大的声学测量之一。重要的是,聚类分析表明,不同的有意义的韵律特征在声学和聚类数量方面都有自己的特点。
{"title":"F0 derivatives in the classification of meaningful tonal movements","authors":"Constantijn Kaland","doi":"10.1016/j.wocn.2025.101454","DOIUrl":"10.1016/j.wocn.2025.101454","url":null,"abstract":"<div><div>Recent work applied cluster analysis on f0 contours in order to find ‘prototypical’ or ‘underlying’ categories as assumed in intonational phonology. However, it remains to be answered to what extent meaningful f0 variation can indeed be captured using automatic classification of surface realizations. Studies on f0 dynamics have suggested that derivatives (e.g., f0 velocity, acceleration and jerk) closely approximate the meaningful components of f0. The question answered in this study is to what extent f0 derivatives are more informative for cluster analysis than other metrics, such as the (time series) f0 contour they are derived from, a static measure representing it, or other acoustic measures such as intensity and duration. This is tested across two clustering techniques (hierarchical and k-medoids) for three different meaningful features expressed in Dutch noun phrases (of the type ‘blue sofa’): focus type (broad, narrow), focus position (adjective, noun) and phrase position (medial, final). Results show that derivatives are among the most informative acoustic measures, although the best performing cluster analyses are the ones based on multiple acoustic measures. Crucially, cluster analyses reveal that the different meaningful prosodic features each have their own characteristics in terms of acoustics and number of clusters.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101454"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How pitch gestures facilitate L2 lexical tone learning: The role of L1–L2 perceptual assimilation in Mandarin speakers’ acquisition of Thai tones 音高手势如何促进二语词汇语调学习:L1-L2知觉同化在普通话使用者泰语语调习得中的作用
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101460
Keke Yu , Jie Zhang , Zilong Li , Xuliang Zhang , Yiyuan He , Li Li , Ruiming Wang
Lexical tone acquisition is a significant yet challenging aspect of learning a tonal language as a second language (L2). Embodied cognition theory offers a promising perspective by highlighting the role of pitch gestures in L2 lexical tone learning. Nevertheless, how pitch gestures facilitate L2 lexical tone acquisition, particularly in relation to native language (L1)–L2 perceptual assimilation patterns remain poorly understood. The present study recruited native Mandarin speakers to learn Thai lexical tones. First, we classified different types of Mandarin-Thai tone assimilation before learning. Next, we employed tone discrimination and identification tasks to investigate how pitch gestures facilitate the learning of Thai lexical tones with varying assimilation patterns. We compared three learning approaches, pitch gesture production, pitch feature observation, and word-picture association. The results revealed three Mandarin-Thai tone assimilation patterns: the Mid and Low Thai tones were assimilated to Mandarin Tone 1; the Falling Thai tone was assimilated to Mandarin Tone 4; the High and Rising Thai tones were assimilated to Mandarin Tone 2. Notably, the pitch gesture production approach enhanced learners’ ability to discriminate between Thai tones assimilated to different Mandarin tones, and identify Thai tones assimilated to Mandarin Tone 1 (Mid/Low) and Tone 4 (Falling). These findings indicate that while embodied experience through pitch gesture production facilitates L2 lexical tone acquisition, its efficacy is modulated by L1–L2 perceptual assimilation patterns. Based on these results, we propose an embodied learning viewpoint that incorporates L1 tonal experience, offering new insights into L2 lexical tone acquisition.
词汇语调习得是声调语言作为第二语言学习的一个重要而又具有挑战性的方面。具身认知理论强调了音高手势在二语词汇语调学习中的作用,为二语词汇语调学习提供了一个有希望的视角。然而,音高手势如何促进第二语言词汇语调习得,特别是与母语(L1) -第二语言感知同化模式的关系,仍然知之甚少。本研究招募了以普通话为母语的人来学习泰语词汇音调。首先,我们在学习前对汉语-泰语声调同化的不同类型进行了分类。接下来,我们采用音调辨别和识别任务来研究音调手势如何促进不同同化模式的泰语词汇音调的学习。我们比较了三种学习方法:音高手势产生、音高特征观察和词-图联想。结果揭示了三种普通话-泰语声调同化模式:中泰语声调和下泰语声调被同化为普通话声调1;泰国降调被普通话同化;泰国的高低声调被同化为普通话声调。值得注意的是,音高手势生成方法增强了学习者区分被不同普通话声调同化的泰语声调的能力,以及识别被普通话声调1(中/低)和声调4(降调)同化的泰语声调的能力。这些发现表明,虽然通过音高手势产生的具身经验促进了二语词汇语调习得,但其效果受到L1-L2知觉同化模式的调节。在此基础上,我们提出了一种结合母语声调经验的具身学习观点,为二语词汇声调习得提供了新的见解。
{"title":"How pitch gestures facilitate L2 lexical tone learning: The role of L1–L2 perceptual assimilation in Mandarin speakers’ acquisition of Thai tones","authors":"Keke Yu ,&nbsp;Jie Zhang ,&nbsp;Zilong Li ,&nbsp;Xuliang Zhang ,&nbsp;Yiyuan He ,&nbsp;Li Li ,&nbsp;Ruiming Wang","doi":"10.1016/j.wocn.2025.101460","DOIUrl":"10.1016/j.wocn.2025.101460","url":null,"abstract":"<div><div>Lexical tone acquisition is a significant yet challenging aspect of learning a tonal language as a second language (L2). Embodied cognition theory offers a promising perspective by highlighting the role of pitch gestures in L2 lexical tone learning. Nevertheless, how pitch gestures facilitate L2 lexical tone acquisition, particularly in relation to native language (L1)–L2 perceptual assimilation patterns remain poorly understood. The present study recruited native Mandarin speakers to learn Thai lexical tones. First, we classified different types of Mandarin-Thai tone assimilation before learning. Next, we employed tone discrimination and identification tasks to investigate how pitch gestures facilitate the learning of Thai lexical tones with varying assimilation patterns. We compared three learning approaches, pitch gesture production, pitch feature observation, and word-picture association. The results revealed three Mandarin-Thai tone assimilation patterns: the Mid and Low Thai tones were assimilated to Mandarin Tone 1; the Falling Thai tone was assimilated to Mandarin Tone 4; the High and Rising Thai tones were assimilated to Mandarin Tone 2. Notably, the pitch gesture production approach enhanced learners’ ability to discriminate between Thai tones assimilated to different Mandarin tones, and identify Thai tones assimilated to Mandarin Tone 1 (Mid/Low) and Tone 4 (Falling). These findings indicate that while embodied experience through pitch gesture production facilitates L2 lexical tone acquisition, its efficacy is modulated by L1–L2 perceptual assimilation patterns. Based on these results, we propose an embodied learning viewpoint that incorporates L1 tonal experience, offering new insights into L2 lexical tone acquisition.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101460"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian beta regressions with brms in R: A tutorial for phoneticians 在R中使用brms的贝叶斯beta回归:语音专家教程
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101455
Stefano Coretta , Paul Bürkner
Phonetic research frequently involves analyzing numeric continuous outcome variables, such as durations, frequencies, loudness, and ratios. Another commonly used outcome type is proportions, including measures like the proportion of voicing during closure, gesture amplitude, and nasalance. Despite their bounded nature, proportions are often modeled using Gaussian regression, largely due to the default settings of commonly used statistical functions in R (e.g., lm() and lmer() from lme4). This practice persists in teaching and research, despite the fact that Gaussian models assume unbounded continuous data and may poorly fit proportion data. To address this issue, this tutorial introduces beta regression models, a more appropriate statistical approach for analyzing proportions. The beta distribution provides a flexible framework for modelling continuous data constrained between 0 and 1. The tutorial employs the brms package in R and assumes familiarity with regression modeling but no prior knowledge of Bayesian statistics. The tutorial includes two case studies illustrating the practical implementation of Bayesian beta regression models. Data and code are available athttps://github.com/stefanocoretta/beta-phon.
语音研究经常涉及分析数字连续结果变量,如持续时间、频率、响度和比率。另一种常用的结果类型是比例,包括在结束时发声的比例、手势幅度和鼻子平衡等指标。尽管比例有界,但通常使用高斯回归建模,这主要是由于R中常用统计函数的默认设置(例如,lme4中的lm()和lmer())。这种做法在教学和研究中一直存在,尽管高斯模型假设的是无界的连续数据,可能难以拟合比例数据。为了解决这个问题,本教程介绍了beta回归模型,这是一种更适合分析比例的统计方法。beta分布提供了一个灵活的框架,用于建模约束在0和1之间的连续数据。本教程使用R中的brms包,假设您熟悉回归建模,但不具备贝叶斯统计的先验知识。本教程包括两个案例研究,说明贝叶斯beta回归模型的实际实现。数据和代码可从https://github.com/stefanocoretta/beta-phon获得。
{"title":"Bayesian beta regressions with brms in R: A tutorial for phoneticians","authors":"Stefano Coretta ,&nbsp;Paul Bürkner","doi":"10.1016/j.wocn.2025.101455","DOIUrl":"10.1016/j.wocn.2025.101455","url":null,"abstract":"<div><div>Phonetic research frequently involves analyzing numeric continuous outcome variables, such as durations, frequencies, loudness, and ratios. Another commonly used outcome type is proportions, including measures like the proportion of voicing during closure, gesture amplitude, and nasalance. Despite their bounded nature, proportions are often modeled using Gaussian regression, largely due to the default settings of commonly used statistical functions in R (e.g., lm() and lmer() from lme4). This practice persists in teaching and research, despite the fact that Gaussian models assume unbounded continuous data and may poorly fit proportion data. To address this issue, this tutorial introduces beta regression models, a more appropriate statistical approach for analyzing proportions. The beta distribution provides a flexible framework for modelling continuous data constrained between 0 and 1. The tutorial employs the brms package in R and assumes familiarity with regression modeling but no prior knowledge of Bayesian statistics. The tutorial includes two case studies illustrating the practical implementation of Bayesian beta regression models. Data and code are available at<span><span>https://github.com/stefanocoretta/beta-phon</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101455"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145528489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-linguistic similarity in L2 suprasegmental learning: evidence from Chinese learners’ perception of Japanese pitch accents 二语超片段学习中的跨语言相似性:来自中国学习者对日语音高口音感知的证据
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101458
Yu Yang , Longjie Dong , Quansheng Xia , Yuxiao Yang , Fei Chen
The acquisition of suprasegmental features in a second language (L2), like lexical tone and pitch accent, can be challenging, yet the impact of cross-linguistic similarity on learning these suprasegmental features has been underexplored. This study explored the role of cross-linguistic similarity in Chinese learners’ perception of Japanese pitch accents, aiming to verify the Perceptual Assimilation Model for Suprasegmentals (PAM-S). In experiment 1, 25 Chinese learners of Japanese with lower proficiency level and 24 learners with higher proficiency level completed a perceptual assimilation task (PAT) that examined the cross-linguistic perceptual similarity between Mandarin tones and Japanese pitch accents. In experiment 2, the same Chinese groups and 35 native Japanese listeners completed a perceptual discrimination test (PDT) of Japanese pitch accent contrasts. Results of PAT showed that Chinese learners successfully categorized Japanese pitch accents into their native Mandarin tone categories: they perceived Japanese H*L as Mandarin Tone 4 (falling tone), LH* as Tone 2 (rising tone), and LH as Tone 1 (level tone). Moreover, results of PDT showed that Chinese learners were able to discriminate H*L–LH* and H*L–LH but had difficulty in the discrimination of LH*–LH. The results also show that Chinese learners’ ability to discriminate Japanese pitch accent contrasts did not improve consistently with increased Japanese experience. This study argues that the LH*–LH contrast is hard for L2 learners regardless of their L2 experience, because of these two accents’ acoustic similarity. The results extended the PAM-S, suggesting that L2 speech perception could be influenced by both the L1–L2 assimilation patterns and acoustic similarity.
在第二语言(L2)中习得超分段特征,如词汇语调和音高口音,可能是具有挑战性的,然而跨语言相似性对学习这些超分段特征的影响尚未得到充分探讨。本研究探讨了跨语言相似性在中国学习者对日语音高口音感知中的作用,旨在验证超音段感知同化模型(PAM-S)。在实验1中,25名日语水平较低的中国学习者和24名日语水平较高的学习者完成了一项感知同化任务(PAT),以检验普通话声调和日语音高口音之间的跨语言感知相似性。在实验2中,同样的中国小组和35名日本本土听众完成了日语音高口音对比的感知辨别测试(PDT)。PAT结果表明,中国学习者成功地将日语音高划分为母语普通话声调类别:他们将日语的H*L理解为普通话声调4(降调),将LH*理解为普通话声调2(升调),将LH理解为普通话声调1(平调)。此外,PDT结果显示,中国学习者能够区分H* L-LH *和H* L-LH,但在区分LH* -LH方面存在困难。结果还表明,中国学习者区分日语音高口音对比的能力并没有随着日语经验的增加而持续提高。本研究认为,LH* -LH的对比对于L2学习者来说很难,无论他们的L2经验如何,因为这两种口音在声学上相似。结果进一步扩展了PAM-S,表明L2语音感知可能受到L1-L2同化模式和声学相似性的影响。
{"title":"Cross-linguistic similarity in L2 suprasegmental learning: evidence from Chinese learners’ perception of Japanese pitch accents","authors":"Yu Yang ,&nbsp;Longjie Dong ,&nbsp;Quansheng Xia ,&nbsp;Yuxiao Yang ,&nbsp;Fei Chen","doi":"10.1016/j.wocn.2025.101458","DOIUrl":"10.1016/j.wocn.2025.101458","url":null,"abstract":"<div><div>The acquisition of suprasegmental features in a second language (L2), like lexical tone and pitch accent, can be challenging, yet the impact of cross-linguistic similarity on learning these suprasegmental features has been underexplored. This study explored the role of cross-linguistic similarity in Chinese learners’ perception of Japanese pitch accents, aiming to verify the Perceptual Assimilation Model for Suprasegmentals (PAM-S). In experiment 1, 25 Chinese learners of Japanese with lower proficiency level and 24 learners with higher proficiency level completed a perceptual assimilation task (PAT) that examined the cross-linguistic perceptual similarity between Mandarin tones and Japanese pitch accents. In experiment 2, the same Chinese groups and 35 native Japanese listeners completed a perceptual discrimination test (PDT) of Japanese pitch accent contrasts. Results of PAT showed that Chinese learners successfully categorized Japanese pitch accents into their native Mandarin tone categories: they perceived Japanese H*L as Mandarin Tone 4 (falling tone), LH* as Tone 2 (rising tone), and LH as Tone 1 (level tone). Moreover, results of PDT showed that Chinese learners were able to discriminate H*L–LH* and H*L–LH but had difficulty in the discrimination of LH*–LH. The results also show that Chinese learners’ ability to discriminate Japanese pitch accent contrasts did not improve consistently with increased Japanese experience. This study argues that the LH*–LH contrast is hard for L2 learners regardless of their L2 experience, because of these two accents’ acoustic similarity. The results extended the PAM-S, suggesting that L2 speech perception could be influenced by both the L1–L2 assimilation patterns and acoustic similarity.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101458"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gestural restructuring beyond coarticulation in Korean /w/-vowel sequences: Evidence from phonetic, dialectal, and gender variation 韩语/w/-元音序列的手势重组超越协同发音:来自语音、方言和性别差异的证据
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-11-01 DOI: 10.1016/j.wocn.2025.101456
Dae-yong Lee , Sahyang Kim , Taehong Cho
This study examines the articulatory patterns of Korean /w/-vowel sequences by comparing tongue dorsum movement trajectories with those of corresponding plain vowels, using Electromagnetic Articulography data from 48 speakers of Seoul and North Gyeongsang dialects. The central question is whether these sequences reflect mere coarticulation or exhibit signs of gestural restructuring in the nucleus vowel. Results reveal gradient restructuring shaped by vowel constriction degree, dialect, and gender. High vowels (/wi/-/i/) show minimal divergence, mid vowels (/we/-/e/, /wɛ/-/ɛ/) moderate divergence, and low back vowels (/wa/-/a/, /wʌ/-/ʌ/) the greatest divergence—especially in dialect- and gender-specific ways. Further analysis of the /e/-/ɛ/ merger and the recent /ʌ/-/ɨ/ split in North Gyeongsang sheds light on how vowel distinctions interact with /w/. The /we/-/wɛ/ pair shows a stronger merger than /e/-/ɛ/, supporting the view that /w/ triggers gestural restructuring of the nucleus vowel and thus plays an active role in reshaping merger trajectories. This effect is further illustrated by the /wa/-/wʌ/ and /a/-/ʌ/ contrasts, with a stronger merger in the /w/-initial context—an effect notably led by male speakers. Interestingly, North Gyeongsang males preserve the /a/-/ʌ/ contrast more robustly than the /wa/-/wʌ/ contrast, possibly due to hyperarticulation of a phonetically redefined /ʌ/ resulting from the recent /ʌ/-/ɨ/ split. These findings are interpreted within a dynamical framework of gestural blending strength (GBS), which varies by vowel constriction and coarticulatory resistance but remains stable for /w/. Overall, the results suggest that what may have begun as low-level coarticulation has evolved into systematic gestural restructuring—a gradient shift toward phonological reorganization shaped by phonetic context, sound change, and sociophonetic variation.
本研究利用来自首尔和庆尚北道方言的48名说话者的电磁发音数据,通过比较舌背运动轨迹和相应的普通元音,研究了韩语/w/-元音序列的发音模式。核心问题是,这些序列是否仅仅反映了协同发音,还是在核心元音中表现出手势重组的迹象。结果表明,元音收缩程度、方言和性别决定了梯度重构。高元音(/wi/-/i/)分化最小,中元音(/we/-/e/, /w / /-/e/)分化适中,而低后元音(/wa/-/a/, /w / /-/ w / /)分化最大——尤其是在方言中——并且以性别为特征。进一步分析庆尚北道的/e/-/ j /合并和最近的/ j / /-/ j /分裂,可以揭示元音差异是如何与/w/相互作用的。与/e/-/ w/相比,/we/-/w /对的合并更强,这支持了/w/触发核心元音的手势重组,从而在重塑合并轨迹中发挥积极作用的观点。这种效果进一步体现在/wa/-/w /和/a/-/ w/的对比中,在/w/-开头的语境中合并更强,这种效果明显由男性说话者主导。有趣的是,庆尚北道的男性保留/a/-/ / /的对比比/wa/-/w /的对比更强烈,这可能是由于最近的/ / /-/ / /分裂导致的语音上重新定义的/ /的高清晰度。这些发现是在手势混合强度(GBS)的动态框架内解释的,GBS随元音收缩和协同发音阻力而变化,但对于/w/保持稳定。总的来说,研究结果表明,可能从低水平的协同发音开始,已经演变成系统的手势重组——一个由语音语境、声音变化和社会语音变化形成的语音重组的梯度转变。
{"title":"Gestural restructuring beyond coarticulation in Korean /w/-vowel sequences: Evidence from phonetic, dialectal, and gender variation","authors":"Dae-yong Lee ,&nbsp;Sahyang Kim ,&nbsp;Taehong Cho","doi":"10.1016/j.wocn.2025.101456","DOIUrl":"10.1016/j.wocn.2025.101456","url":null,"abstract":"<div><div>This study examines the articulatory patterns of Korean /w/-vowel sequences by comparing tongue dorsum movement trajectories with those of corresponding plain vowels, using Electromagnetic Articulography data from 48 speakers of Seoul and North Gyeongsang dialects. The central question is whether these sequences reflect mere coarticulation or exhibit signs of gestural restructuring in the nucleus vowel. Results reveal gradient restructuring shaped by vowel constriction degree, dialect, and gender. High vowels (/wi/-/i/) show minimal divergence, mid vowels (/we/-/e/, /wɛ/-/ɛ/) moderate divergence, and low back vowels (/wa/-/a/, /wʌ/-/ʌ/) the greatest divergence—especially in dialect- and gender-specific ways. Further analysis of the /e/-/ɛ/ merger and the recent /ʌ/-/ɨ/ split in North Gyeongsang sheds light on how vowel distinctions interact with /w/. The /we/-/wɛ/ pair shows a stronger merger than /e/-/ɛ/, supporting the view that /w/ triggers gestural restructuring of the nucleus vowel and thus plays an active role in reshaping merger trajectories. This effect is further illustrated by the /wa/-/wʌ/ and /a/-/ʌ/ contrasts, with a stronger merger in the /w/-initial context—an effect notably led by male speakers. Interestingly, North Gyeongsang males preserve the /a/-/ʌ/ contrast more robustly than the /wa/-/wʌ/ contrast, possibly due to hyperarticulation of a phonetically redefined /ʌ/ resulting from the recent /ʌ/-/ɨ/ split. These findings are interpreted within a dynamical framework of gestural blending strength (GBS), which varies by vowel constriction and coarticulatory resistance but remains stable for /w/. Overall, the results suggest that what may have begun as low-level coarticulation has evolved into systematic gestural restructuring—a gradient shift toward phonological reorganization shaped by phonetic context, sound change, and sociophonetic variation.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101456"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transitory and sustained Cf0 effects: Evidence from Swiss German 短暂和持续的Cf0效应:来自瑞士德语的证据
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-10-03 DOI: 10.1016/j.wocn.2025.101453
Franka Zebe-Sheng , Camille Watter , Stephan Schmid , D. Robert Ladd
It is generally agreed that f0 following phonologically voiceless plosives is higher than after voiced plosives. Such consonant f0 (Cf0) effects have been reported in many languages. However, the phonetic basis of the ‘voiceless’ – ‘voiced’ distinction may differ between languages; for example, in English the distinction involves long-lag VOT in ‘voiceless’ plosives and short-lag VOT or prevoicing in ‘voiced’ plosives, while in Dutch the ‘voiceless’ plosives have short-lag VOT and the ‘voiced’ plosives are generally prevoiced. This study focuses on Swiss German, where neither long-lag VOT nor voicing is present: the primary difference between lenis (‘voiced’) and fortis (‘voiceless’) plosives lies in closure duration. Replicating Ladd and Schmid [Journal of Phonetics (2018), 71, 229–248], we show that both lenis and fortis plosives exhibit higher initial f0 followed by a brief fall, typical of ‘voiceless’ plosives in many languages. Using newer statistical methods (Generalised Additive Mixed Models), we also show that, during the latter part of the vowel beyond the initial f0 drop, overall f0 level is significantly higher after ‘fortis’ than after ‘lenis’ plosives. This suggests that two distinct but interacting Cf0 effects are involved. We discuss the relevance of this finding for future experimental work on Cf0.
人们普遍认为,在音系上,后面的不发音元音比后面的浊音元音要高。这种辅音f0 (Cf0)效应在许多语言中都有报道。然而,“不发声”和“发声”区分的语音基础可能因语言而异;例如,在英语中,这种区别涉及到“不发声”爆破语中的长滞后元音和“发声”爆破语中的短滞后元音或前置元音,而在荷兰语中,“不发声”爆破语有短滞后元音,而“发声”爆破语通常是前置元音。这项研究的重点是瑞士德语,那里既没有长滞后的VOT,也没有发声:lenis(“发声”)和fortis(“不发声”)爆破语的主要区别在于关闭持续时间。复制Ladd和Schmid [Journal of Phonetics(2018), 71, 229-248],我们发现lenis和fortis爆破音都表现出更高的初始f0,然后是短暂的下降,这是许多语言中典型的“无音”爆破音。使用较新的统计方法(广义加性混合模型),我们还表明,在元音的后半部分,在初始的f0下降之后,“fortis”之后的整体f0水平明显高于“lenis”之后的爆破。这表明涉及两种不同但相互作用的Cf0效应。我们讨论了这一发现对未来Cf0实验工作的相关性。
{"title":"Transitory and sustained Cf0 effects: Evidence from Swiss German","authors":"Franka Zebe-Sheng ,&nbsp;Camille Watter ,&nbsp;Stephan Schmid ,&nbsp;D. Robert Ladd","doi":"10.1016/j.wocn.2025.101453","DOIUrl":"10.1016/j.wocn.2025.101453","url":null,"abstract":"<div><div>It is generally agreed that f0 following phonologically voiceless plosives is higher than after voiced plosives. Such consonant f0 (Cf0) effects have been reported in many languages. However, the phonetic basis of the ‘voiceless’ – ‘voiced’ distinction may differ between languages; for example, in English the distinction involves long-lag VOT in ‘voiceless’ plosives and short-lag VOT or prevoicing in ‘voiced’ plosives, while in Dutch the ‘voiceless’ plosives have short-lag VOT and the ‘voiced’ plosives are generally prevoiced. This study focuses on Swiss German, where neither long-lag VOT nor voicing is present: the primary difference between lenis (‘voiced’) and fortis (‘voiceless’) plosives lies in closure duration. Replicating Ladd and Schmid [Journal of Phonetics (2018), 71, 229–248], we show that both lenis and fortis plosives exhibit higher initial f0 followed by a brief fall, typical of ‘voiceless’ plosives in many languages. Using newer statistical methods (Generalised Additive Mixed Models), we also show that, during the latter part of the vowel beyond the initial f0 drop, overall f0 level is significantly higher after ‘fortis’ than after ‘lenis’ plosives. This suggests that two distinct but interacting Cf0 effects are involved. We discuss the relevance of this finding for future experimental work on Cf0.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101453"},"PeriodicalIF":2.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of minimal pair competitors on voice onset time and pitch accent production in South Swedish 最小对竞争者对瑞典南部语音开始时间和音高重音产生的影响
IF 2.4 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-09-05 DOI: 10.1016/j.wocn.2025.101445
Benjamin M. Kramer, Jason A. Shaw
Previous findings suggest that words in minimal pairs are hyperarticulated along the phonetic dimension that distinguishes them. We investigated the effects of minimal pair presence on the production of the pitch accent contrast and the stop voicing contrast in South Swedish; while contrastive hyperarticulation along these dimensions has been observed in other languages, these contrasts in South Swedish have a particularly low functional load and a particularly high category distance, respectively. Results from an experimental word naming task indicate that minimal pair competition does not significantly affect voice onset time in South Swedish. For the pitch accent contrast, minimal pair competition is significantly correlated with converged rather than diverged accent contours. These findings are consistent with activation dynamics of phonetic planning that are sensitive to language-specific characteristics of a contrast, such as category distance and functional load.
先前的研究结果表明,最小成对的单词在区分它们的语音维度上是高度发音的。我们研究了最小对存在对南瑞典语的音高重音对比和顿音对比产生的影响;虽然在其他语言中也观察到沿着这些维度的对比高发音,但瑞典南部的这些对比分别具有特别低的功能负荷和特别高的类别距离。实验结果表明,最小配对竞争对南瑞典语语音启动时间没有显著影响。对于音调重音对比,最小对竞争与收敛重音轮廓显著相关,而不是发散重音轮廓。这些发现与语音规划的激活动力学是一致的,语音规划对对比的特定语言特征很敏感,如类别距离和功能负荷。
{"title":"Effects of minimal pair competitors on voice onset time and pitch accent production in South Swedish","authors":"Benjamin M. Kramer,&nbsp;Jason A. Shaw","doi":"10.1016/j.wocn.2025.101445","DOIUrl":"10.1016/j.wocn.2025.101445","url":null,"abstract":"<div><div>Previous findings suggest that words in minimal pairs are hyperarticulated along the phonetic dimension that distinguishes them. We investigated the effects of minimal pair presence on the production of the pitch accent contrast and the stop voicing contrast in South Swedish; while contrastive hyperarticulation along these dimensions has been observed in other languages, these contrasts in South Swedish have a particularly low functional load and a particularly high category distance, respectively. Results from an experimental word naming task indicate that minimal pair competition does not significantly affect voice onset time in South Swedish. For the pitch accent contrast, minimal pair competition is significantly correlated with <em>converged</em> rather than diverged accent contours. These findings are consistent with activation dynamics of phonetic planning that are sensitive to language-specific characteristics of a contrast, such as category distance and functional load.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101445"},"PeriodicalIF":2.4,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145005424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of rhythm on inter-gestural coupling of onset and vowel gestures and predictive timing in stuttering 节奏对口吃起音和元音手势的手势间耦合及预测时间的影响
IF 1.9 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-07-21 DOI: 10.1016/j.wocn.2025.101432
Mona Franke , Simone Falk , Nicole Benker , Phil Hoole
In this study we investigate articulatory timing in fluent speech production in persons who stutter (PWS) and persons who do not stutter (PWNS) by focusing on consonant–vowel (CV)-timing, which refers to the coupling of onset consonant and vowel gestures, as well as on predictive timing, which describes the synchronization of the speech onset to a rhythmic event. These two timing mechanisms are particularly interesting to investigate in relation to stuttering, given that CV-timing is especially challenging for PWS and that they exhibit differences in predictive timing related to speech-motor and manual-motor tasks, suggesting that disturbances in inter-gestural coordination and auditory-motor integration may contribute to stuttering. To shed further light on this, we examine CV-timing and predictive timing under different rhythmic conditions.
Twenty German-speaking adults (10 PWS and 10 PWNS) were recorded using electromagnetic articulography (EMA). Participants produced target words that started with a bilabial onset, followed by a vowel (/a/, /o/, or /u/) and were embedded in a carrier phrase in four different conditions: Unpaced (speaking), Tapping (speaking while concurrently tapping), Metronome (synchronizing speech to a metronome), and Metronome+Tapping (speaking to a metronome while concurrently tapping).
We found evidence for both CV-timing and predictive timing differences between PWS and PWNS. Our results suggest that in general, PWS time CV gestures closer together. However, CV-timing differences were linked to condition in an unexpected way. As to predictive timing, PWS initiated their speech later to a metronome beat than PWNS but they did not differ when timing speech to their own finger tapping, indicating that motor-pacing may stabilize the speech motor system of PWS. In the Metronome+Tapping condition, the groups appeared to rely on different rhythmic cues. While PWNS timed their speech more towards the metronome beat, PWS synchronized their speech onset closer to the finger tap. We discuss that this difference could result from differences in CV-timing. Furthermore, the potential for future research on the interplay of non-verbal and verbal motor systems and the possible benefit for the stuttering population is discussed.
在本研究中,我们研究了口吃者(PWS)和非口吃者(PWNS)在流利言语产生中的发音时间,重点关注辅音-元音(CV)时间,这是指辅音和元音的开始手势的耦合,以及预测时间,这描述了语音开始与节奏事件的同步。这两种计时机制对口吃的研究尤其有趣,因为对PWS来说,cv计时尤其具有挑战性,而且他们在言语-运动和手动-运动任务相关的预测计时方面表现出差异,这表明手势间协调和听觉-运动整合的障碍可能导致口吃。为了进一步阐明这一点,我们研究了不同节奏条件下的cv计时和预测计时。使用电磁关节造影(EMA)对20名讲德语的成年人(10名PWS和10名PWNS)进行了记录。参与者提出的目标词以双音节开头,后面跟着一个元音(/a/, /o/或/u/),并在四种不同的条件下嵌入载体短语:无节奏(说话),敲击(说话时同时敲击),节拍器(将语音与节拍器同步),节拍器+敲击(同时敲击节拍器说话)。我们发现了PWS和PWNS之间cv时间和预测时间差异的证据。我们的研究结果表明,在一般情况下,PWS时间CV手势更接近。然而,cv时间的差异以一种意想不到的方式与病情联系在一起。在预测计时方面,PWS比PWNS晚于节拍器节拍启动言语,但在将言语计时为自己的手指轻敲时,两者没有差异,说明运动起搏可能稳定了PWS的言语运动系统。在节拍器+敲击条件下,各组似乎依赖于不同的节奏线索。PWNS的说话时间更接近节拍器的节拍,而PWS的说话时间更接近手指的敲击。我们讨论了这种差异可能是由于cv时间的差异造成的。此外,本文还讨论了未来研究非语言和语言运动系统相互作用的潜力以及对口吃人群可能带来的好处。
{"title":"The effect of rhythm on inter-gestural coupling of onset and vowel gestures and predictive timing in stuttering","authors":"Mona Franke ,&nbsp;Simone Falk ,&nbsp;Nicole Benker ,&nbsp;Phil Hoole","doi":"10.1016/j.wocn.2025.101432","DOIUrl":"10.1016/j.wocn.2025.101432","url":null,"abstract":"<div><div>In this study we investigate articulatory timing in fluent speech production in persons who stutter (PWS) and persons who do not stutter (PWNS) by focusing on consonant–vowel (CV)-timing, which refers to the coupling of onset consonant and vowel gestures, as well as on predictive timing, which describes the synchronization of the speech onset to a rhythmic event. These two timing mechanisms are particularly interesting to investigate in relation to stuttering, given that CV-timing is especially challenging for PWS and that they exhibit differences in predictive timing related to speech-motor and manual-motor tasks, suggesting that disturbances in inter-gestural coordination and auditory-motor integration may contribute to stuttering. To shed further light on this, we examine CV-timing and predictive timing under different rhythmic conditions.</div><div>Twenty German-speaking adults (10 PWS and 10 PWNS) were recorded using electromagnetic articulography (EMA). Participants produced target words that started with a bilabial onset, followed by a vowel (/a/, /o/, or /u/) and were embedded in a carrier phrase in four different conditions: Unpaced (speaking), Tapping (speaking while concurrently tapping), Metronome (synchronizing speech to a metronome), and Metronome+Tapping (speaking to a metronome while concurrently tapping).</div><div>We found evidence for both CV-timing and predictive timing differences between PWS and PWNS. Our results suggest that in general, PWS time CV gestures closer together. However, CV-timing differences were linked to condition in an unexpected way. As to predictive timing, PWS initiated their speech later to a metronome beat than PWNS but they did not differ when timing speech to their own finger tapping, indicating that motor-pacing may stabilize the speech motor system of PWS. In the Metronome+Tapping condition, the groups appeared to rely on different rhythmic cues. While PWNS timed their speech more towards the metronome beat, PWS synchronized their speech onset closer to the finger tap. We discuss that this difference could result from differences in CV-timing. Furthermore, the potential for future research on the interplay of non-verbal and verbal motor systems and the possible benefit for the stuttering population is discussed.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101432"},"PeriodicalIF":1.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonetic information in the vowel spectrum: the meaning of mel-Frequency Cepstral Coefficients 元音谱中的语音信息:mel-Frequency倒谱系数的意义
IF 1.9 1区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2025-07-17 DOI: 10.1016/j.wocn.2025.101434
Khalil Iskarous , Alessandro Vietti
There is still disagreement in the acoustic phonetics literature on how phonetic information is encoded in the vowel acoustic spectrum. The “formant hypothesis” holds that formant frequency locations are the primary encoding of phonetic information. But perceptual experiments have shown that listeners can identify vowels, to a certain extent, even when formant peaks are suppressed. This has given rise to the “whole-spectrum” hypothesis, which describes each vowel segment in terms of a high-dimensional description of its entire spectrum. While the “whole-spectrum” hypothesis better predicts suppressed-formant vowel perception, one advantage of the “formant hypothesis” is that it parameterizes a vowel inventory of a language in terms of featural classes indexed by a few values of formant frequencies. These frequency scales serve to describe a language’s phonological organization and sound change. In this paper, we show that the mel-frequency Cepstral Coefficients (MFCCs), whole-spectrum parameterizations that have been used in speech technology from the 1970’s till today, also have a phonetic interpretation leading to the same featural classes as traditional description. This is despite the fact that for many decades they have been thought to not be interpretable. Our arguments are based on analyses of all vowel data from the TIMIT database, with large amounts of speaker, context, prosodic, and dialectal variability, using information theory, effect-size statistics, and Fourier theory. Our goal is to show that MFCCs can be useful for further developments in the field of acoustic phonetics, because while they extract phonetically-distinctive information from the entire spectrum, they can also further understanding of the linguistic structure of vowel spaces.
语音信息如何在元音声谱中编码,在声学语音学文献中仍存在分歧。“共振峰假说”认为共振峰频率位置是语音信息的主要编码。但感知实验表明,即使在形成峰被抑制的情况下,听众也能在一定程度上识别元音。这就产生了“全谱”假说,它用整个谱的高维描述来描述每个元音片段。虽然“全谱”假说能更好地预测被抑制的形成峰元音感知,但“形成峰假说”的一个优点是,它以几个形成峰频率值为索引的特征类别来参数化语言的元音清单。这些频率尺度用来描述一种语言的语音组织和声音变化。在本文中,我们展示了mel-frequency倒谱系数(MFCCs),即从20世纪70年代至今一直用于语音技术的全频谱参数化,也具有语音解释,导致与传统描述相同的特征类别。尽管几十年来它们一直被认为是不可解释的。我们的论点是基于对TIMIT数据库中所有元音数据的分析,使用信息论、效应大小统计和傅立叶理论,分析了大量的说话人、上下文、韵律和方言差异。我们的目标是证明mfcc在声学语音学领域的进一步发展是有用的,因为当它们从整个频谱中提取语音特征信息时,它们也可以进一步理解元音空间的语言结构。
{"title":"Phonetic information in the vowel spectrum: the meaning of mel-Frequency Cepstral Coefficients","authors":"Khalil Iskarous ,&nbsp;Alessandro Vietti","doi":"10.1016/j.wocn.2025.101434","DOIUrl":"10.1016/j.wocn.2025.101434","url":null,"abstract":"<div><div>There is still disagreement in the acoustic phonetics literature on how phonetic information is encoded in the vowel acoustic spectrum. The “formant hypothesis” holds that formant frequency locations are the primary encoding of phonetic information. But perceptual experiments have shown that listeners can identify vowels, to a certain extent, even when formant peaks are suppressed. This has given rise to the “whole-spectrum” hypothesis, which describes each vowel segment in terms of a high-dimensional description of its entire spectrum. While the “whole-spectrum” hypothesis better predicts suppressed-formant vowel perception, one advantage of the “formant hypothesis” is that it parameterizes a vowel inventory of a language in terms of featural classes indexed by a few values of formant frequencies. These frequency scales serve to describe a language’s phonological organization and sound change. In this paper, we show that the mel-frequency Cepstral Coefficients (MFCCs), whole-spectrum parameterizations that have been used in speech technology from the 1970’s till today, also have a phonetic interpretation leading to the same featural classes as traditional description. This is despite the fact that for many decades they have been thought to not be interpretable. Our arguments are based on analyses of all vowel data from the TIMIT database, with large amounts of speaker, context, prosodic, and dialectal variability, using information theory, effect-size statistics, and Fourier theory. Our goal is to show that MFCCs can be useful for further developments in the field of acoustic phonetics, because while they extract phonetically-distinctive information from the entire spectrum, they can also further understanding of the linguistic structure of vowel spaces.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101434"},"PeriodicalIF":1.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144655552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Phonetics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1