Pub Date : 2026-01-01DOI: 10.1016/j.wocn.2025.101461
Jungyun Seo, Ruaridh Purse, Jelena Krivokapić
This study investigates how planning and prosodic structure interact in speech production. Planning is operationalized in this study as the selection of one lexical item from two possible candidates. In an Electromagnetic Articulometry study that elicited this planning at a word or phrase boundary, two questions were examined. The first question tested whether planning has an effect on the kinematic properties of prosodic phrase boundaries. The results show that an increase in planning load does not affect the scope of boundary-related gestural lengthening but only leads to an increase in pause duration. The second question tested the effect of planning at word boundaries, specifically whether an increase in planning load at a word boundary leads to the production of a prosodic phrase boundary or just the insertion of a pause. The results show that an increase in the planning load at word boundaries leads to lengthening of gestures at a boundary and the insertion of pauses, indicating that speakers insert prosodic phrase boundaries when they need more planning time.
{"title":"The Interplay of Planning and Prosody: Investigating the Bidirectional Influences of Planning and Prosody in Speech Production","authors":"Jungyun Seo, Ruaridh Purse, Jelena Krivokapić","doi":"10.1016/j.wocn.2025.101461","DOIUrl":"10.1016/j.wocn.2025.101461","url":null,"abstract":"<div><div>This study investigates how planning and prosodic structure interact in speech production. Planning is operationalized in this study as the selection of one lexical item from two possible candidates. In an Electromagnetic Articulometry study that elicited this planning at a word or phrase boundary, two questions were examined. The first question tested whether planning has an effect on the kinematic properties of prosodic phrase boundaries. The results show that an increase in planning load does not affect the scope of boundary-related gestural lengthening but only leads to an increase in pause duration. The second question tested the effect of planning at word boundaries, specifically whether an increase in planning load at a word boundary leads to the production of a prosodic phrase boundary or just the insertion of a pause. The results show that an increase in the planning load at word boundaries leads to lengthening of gestures at a boundary and the insertion of pauses, indicating that speakers insert prosodic phrase boundaries when they need more planning time.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101461"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intonation, the linguistic use of voice pitch, is critical for acquisition, phonetic organisation at the phrasal level and subsequent speech processing. By indicating syntactic constituency and encoding information structure and conversational implicatures, intonation is not only crucial for communication, but also for understanding grammatical phenomena such as information structure and syntax. Nevertheless, existing research has not always done justice to the roles that intonation plays in grammar, acquisition, and communication. Here, we first present an overview of what intonation is (and what it is not), and briefly discuss the essential tenets of the autosegmental-metrical (AM) theory of intonational phonology, currently the most influential and widely used approach to intonation. We critically review the standard methods of conducting intonation research within AM and present newer methodologies developed in response to shortcomings of the standard methods. We conclude by reviewing potential changes to AM in light of the findings stemming from these newer methods, followed by suggestions for future research.
{"title":"Advancements of phonetics in the 21st century: Intonation","authors":"Amalia Arvaniti , Martine Grice , Mariapaola D’Imperio","doi":"10.1016/j.wocn.2025.101459","DOIUrl":"10.1016/j.wocn.2025.101459","url":null,"abstract":"<div><div>Intonation, the linguistic use of voice pitch, is critical for acquisition, phonetic organisation at the phrasal level and subsequent speech processing. By indicating syntactic constituency and encoding information structure and conversational implicatures, intonation is not only crucial for communication, but also for understanding grammatical phenomena such as information structure and syntax. Nevertheless, existing research has not always done justice to the roles that intonation plays in grammar, acquisition, and communication. Here, we first present an overview of what intonation is (and what it is not), and briefly discuss the essential tenets of the autosegmental-metrical (AM) theory of intonational phonology, currently the most influential and widely used approach to intonation. We critically review the standard methods of conducting intonation research within AM and present newer methodologies developed in response to shortcomings of the standard methods. We conclude by reviewing potential changes to AM in light of the findings stemming from these newer methods, followed by suggestions for future research.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101459"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01DOI: 10.1016/j.wocn.2025.101472
Miao Zhang , Shuxiao Gong , Chengyu Guo
This study investigates tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang, a Sinitic language with six lexical tones in Hunan Province, China. Changsha Xiang exhibits two distinct tone sandhi patterns: right-dominant and left-dominant. Previous descriptions of these patterns have been largely impressionistic, lacking an instrumental phonetic analysis. Building on data from 16 native speakers, this study provides an acoustic analysis of the tonal realization, coarticulation, and phonetic reduction patterns in Changsha Xiang. Our results indicate that in right-dominant sequences, the non-dominant syllable exhibits reduced F0 excursion without neutralization, whereas in left-dominant sequences, the non-dominant syllable undergoes paradigmatic neutralization to four short level tones. Additionally, the study finds that the tone sandhi pattern affects tonal coarticulation, with a larger carryover effect in non-dominant final syllables. These findings contribute to a broader understanding of tonal processes in Sinitic languages and highlight parallels between tone sandhi and stress-tone interactions in other languages.
{"title":"Tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang","authors":"Miao Zhang , Shuxiao Gong , Chengyu Guo","doi":"10.1016/j.wocn.2025.101472","DOIUrl":"10.1016/j.wocn.2025.101472","url":null,"abstract":"<div><div>This study investigates tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang, a Sinitic language with six lexical tones in Hunan Province, China. Changsha Xiang exhibits two distinct tone sandhi patterns: right-dominant and left-dominant. Previous descriptions of these patterns have been largely impressionistic, lacking an instrumental phonetic analysis. Building on data from 16 native speakers, this study provides an acoustic analysis of the tonal realization, coarticulation, and phonetic reduction patterns in Changsha Xiang. Our results indicate that in right-dominant sequences, the non-dominant syllable exhibits reduced F0 excursion without neutralization, whereas in left-dominant sequences, the non-dominant syllable undergoes paradigmatic neutralization to four short level tones. Additionally, the study finds that the tone sandhi pattern affects tonal coarticulation, with a larger carryover effect in non-dominant final syllables. These findings contribute to a broader understanding of tonal processes in Sinitic languages and highlight parallels between tone sandhi and stress-tone interactions in other languages.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101472"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1016/j.wocn.2025.101471
Hicham Adem
Previous research on Arabic-speaking learners of English (L2) has often focused on adults and relied on static formant measurements, underscoring the need for dynamic approaches that better capture the temporal characteristics of L2 vowel production. Grounded in the revised Speech Learning Model (SLM-r), this study analyzes the formant trajectories of seven English vowels in CVC contexts, produced by 65 adolescent female learners whose L1 is Palestinian Arabic. Static vowel targets were descriptively compared to normative values from a corpus of L1 English speakers (N = 436; Lee et al., 1999), with age, gender, and CVC context matched across datasets; dynamic analyses were conducted on the L2 learner data only. Formant trajectories were modeled as multidimensional time series to assess Vowel-Inherent Spectral Change (VISC), using three trajectory-based measures—Vector Length (VL), Trajectory Length (TL), and Formant Velocity (FV)—alongside angular displacement as an additional cue. A multi-method approach combining Linear Mixed-Effects modeling (LME), Functional Principal Component Analysis (fPCA), and trajectory analysis identified group-level patterns of vowel-space compression, slower formant transitions, and reduced spectral–temporal modulation. fPCA showed joint reductions in F1 and F2 dynamics, reflecting a centralized, flattened vowel space with limited temporal modulation and diminished contrast. These outcomes suggest system-level reorganization of the L2 vowel space, shaped by L1-based constraints. Trajectory orientation emerged as a secondary cue when integrated with other time-varying measures, with directional characteristics providing limited but detectable acoustic distinctiveness. Classification models comparing dynamic, static, combined, and direction-only features showed that time-resolved spectral cues most consistently supported within-speaker classification and were more sensitive to spectral variation. Trajectory and fPCA analyses extend the empirical scope of the SLM-r by highlighting that equivalence classification may involve not only reduced acoustic category separation but also a reorganization of the spectral–temporal geometry of the L2 vowel system. These findings offer a signal-level perspective on vowel restructuring as it unfolds acoustically in underrepresented L2 learners and broaden our understanding of the acoustic structure of L2 vowels across diverse learning environments.
先前对说阿拉伯语的英语学习者(L2)的研究通常集中在成年人身上,并依赖于静态的形成峰测量,强调需要动态的方法来更好地捕捉L2元音产生的时间特征。本研究以修正后的言语学习模型(SLM-r)为基础,分析了65名母语为巴勒斯坦阿拉伯语的青春期女性学习者在CVC语境中7个英语元音的形成轨迹。静态元音目标与来自母语英语者语料库的规范值进行描述性比较(N = 436; Lee et al., 1999),年龄、性别和CVC上下文在数据集之间匹配;仅对第二语言学习者的数据进行动态分析。形成峰轨迹被建模为多维时间序列,以评估元音固有谱变化(VISC),使用三种基于轨迹的测量-矢量长度(VL),轨迹长度(TL)和形成峰速度(FV) -以及角位移作为额外线索。结合线性混合效应建模(LME)、功能主成分分析(fPCA)和轨迹分析的多方法方法确定了元音空间压缩、较慢的形成峰转换和减少的频谱时间调制的群体水平模式。fPCA表现出F1和F2动态的联合减弱,反映了集中、平坦的元音空间,时间调制有限,对比度减弱。这些结果表明L2元音空间的系统级重组是由基于l1的约束形成的。当与其他时变测量相结合时,轨迹方向成为次要线索,方向特征提供有限但可检测的声学特征。对比动态、静态、组合和方向特征的分类模型表明,时间分辨光谱线索最一致地支持说话人内部分类,并且对光谱变化更敏感。轨迹和fPCA分析扩展了SLM-r的经验范围,强调等效分类可能不仅涉及减少声学类别分离,还涉及L2元音系统的频谱-时间几何结构的重组。这些发现提供了一个信号层面的元音重组视角,因为它在代表性不足的二语学习者中展现了声学特征,并拓宽了我们对不同学习环境中二语元音声学结构的理解。
{"title":"Dynamic formant trajectories in L2 English: evidence from Arabic-speaking adolescent learners","authors":"Hicham Adem","doi":"10.1016/j.wocn.2025.101471","DOIUrl":"10.1016/j.wocn.2025.101471","url":null,"abstract":"<div><div>Previous research on Arabic-speaking learners of English (L2) has often focused on adults and relied on static formant measurements, underscoring the need for dynamic approaches that better capture the temporal characteristics of L2 vowel production. Grounded in the revised Speech Learning Model (SLM-r), this study analyzes the formant trajectories of seven English vowels in CVC contexts, produced by 65 adolescent female learners whose L1 is Palestinian Arabic. Static vowel targets were descriptively compared to normative values from a corpus of L1 English speakers (N = 436; Lee et al., 1999), with age, gender, and CVC context matched across datasets; dynamic analyses were conducted on the L2 learner data only. Formant trajectories were modeled as multidimensional time series to assess Vowel-Inherent Spectral Change (VISC), using three trajectory-based measures—Vector Length (VL), Trajectory Length (TL), and Formant Velocity (FV)—alongside angular displacement as an additional cue. A multi-method approach combining Linear Mixed-Effects modeling (LME), Functional Principal Component Analysis (fPCA), and trajectory analysis identified group-level patterns of vowel-space compression, slower formant transitions, and reduced spectral–temporal modulation. fPCA showed joint reductions in F1 and F2 dynamics, reflecting a centralized, flattened vowel space with limited temporal modulation and diminished contrast. These outcomes suggest system-level reorganization of the L2 vowel space, shaped by L1-based constraints. Trajectory orientation emerged as a secondary cue when integrated with other time-varying measures, with directional characteristics providing limited but detectable acoustic distinctiveness. Classification models comparing dynamic, static, combined, and direction-only features showed that time-resolved spectral cues most consistently supported within-speaker classification and were more sensitive to spectral variation. Trajectory and fPCA analyses extend the empirical scope of the SLM-r by highlighting that equivalence classification may involve not only reduced acoustic category separation but also a reorganization of the spectral–temporal geometry of the L2 vowel system. These findings offer a signal-level perspective on vowel restructuring as it unfolds acoustically in underrepresented L2 learners and broaden our understanding of the acoustic structure of L2 vowels across diverse learning environments.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101471"},"PeriodicalIF":2.4,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.wocn.2025.101470
Taehong Cho , Sahyang Kim , Holger Mitterer
This special issue examines how fine phonetic detail participates in the shaping of sound systems. Across fourteen studies, the central theme is that subtle temporal, spectral, and articulatory patterns are not incidental by-products of articulation, but are systematically regulated aspects of speakers’ phonetic knowledge. They provide the means through which phonological contrasts and prosodic structure are realized, maintained, and sometimes reorganized. The contributions show how languages allocate continuous phonetic parameters—such as timing, coordination, voice quality, and nasality—within prosodic domains (e.g., phrases, words, and syllables) and under general biomechanical and communicative pressures. Studies of Irish, Hawaiian, Japanese, and Mandarin illustrate how prosodic structure guides segmental and suprasegmental realization. Work on English, German, Danish, and Cantonese demonstrates how fine phonetic detail underlies patterns of variation and creates potential pathways for change. Production connects naturally to perception and learning: findings from English accent adaptation and Samoan iterated learning reveal how listeners stabilize or reinterpret detail, linking individual processing to community-level patterning. A set of studies on Italian, Korean, English, and L2 German show how prominence reorganizes cues across articulation, interaction, and acquisition, shaping how speakers signal and listeners recover linguistic structure. These studies converge on a view in which fine phonetic detail arises from a central phonetic component (or the phonetic grammar) of linguistic structure—controlled by speakers, shaped by universal motor and perceptual constraints, and continually adjusted through perception and learning. In this perspective, sound systems emerge from the interplay of these regulated patterns, which sustain contrasts, support communication, and open principled routes for change.
{"title":"Linguistic and cognitive functions of fine phonetic detail underlying sound systems and sound change","authors":"Taehong Cho , Sahyang Kim , Holger Mitterer","doi":"10.1016/j.wocn.2025.101470","DOIUrl":"10.1016/j.wocn.2025.101470","url":null,"abstract":"<div><div>This special issue examines how fine phonetic detail participates in the shaping of sound systems. Across fourteen studies, the central theme is that subtle temporal, spectral, and articulatory patterns are not incidental by-products of articulation, but are systematically regulated aspects of speakers’ phonetic knowledge. They provide the means through which phonological contrasts and prosodic structure are realized, maintained, and sometimes reorganized. The contributions show how languages allocate continuous phonetic parameters—such as timing, coordination, voice quality, and nasality—within prosodic domains (e.g., phrases, words, and syllables) and under general biomechanical and communicative pressures. Studies of Irish, Hawaiian, Japanese, and Mandarin illustrate how prosodic structure guides segmental and suprasegmental realization. Work on English, German, Danish, and Cantonese demonstrates how fine phonetic detail underlies patterns of variation and creates potential pathways for change. Production connects naturally to perception and learning: findings from English accent adaptation and Samoan iterated learning reveal how listeners stabilize or reinterpret detail, linking individual processing to community-level patterning. A set of studies on Italian, Korean, English, and L2 German show how prominence reorganizes cues across articulation, interaction, and acquisition, shaping how speakers signal and listeners recover linguistic structure. These studies converge on a view in which fine phonetic detail arises from a central phonetic component (or the phonetic grammar) of linguistic structure—controlled by speakers, shaped by universal motor and perceptual constraints, and continually adjusted through perception and learning. In this perspective, sound systems emerge from the interplay of these regulated patterns, which sustain contrasts, support communication, and open principled routes for change.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101470"},"PeriodicalIF":2.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.wocn.2025.101457
Wei Zhang , Meghan Clayards , Morgan Sonderegger
Native speakers imitate F0 contours that vary between two lexical tones non-linearly–they do not precisely reproduce the presented F0 features but instead cluster them toward tonal categories, the so-called contrast mediation effect. However, less is known whether non-native speakers who lack the lexical tone phonology will show linear imitation of F0 contours. Addressing this question will deepen our understanding of whether F0 imitation is solely influenced by lexical tone contrasts or also shaped by other sources of non-linearity beyond phonological contrasts. To investigate this, the current study examined the categorization and imitation of a Mandarin flat-falling tonal continuum by both Mandarin speakers and English speakers who were naïve to tonal languages. Imitation distributions were analyzed by comparing two models: a linear regression model, which assumes participants linearly track phonetic cues, and a mixture regression model, which assumes imitation reflects underlying categories. The mixture regression model fit the data better for the Mandarin speakers while the reverse was true for the English speakers, suggesting that Mandarin speakers imitated the F0 contours more categorically than English speakers. However, for both groups, the data was best fit using a weighted combination of both models. For the Mandarin group this result along with additional analyses of duration, F1 and intensity suggest that tone categories involve both phonological and phonetic information and imitation taps both, possibly via hyper- and hypo-articulation. For English participants, the evidence for categorical mediation suggests that imitation is mediated by factors other than lexically contrastive linguistic categories, although the exact nature of the factors is unclear.
{"title":"Imitation of F0 tone contours by Mandarin and English speakers is both categorical and continuous","authors":"Wei Zhang , Meghan Clayards , Morgan Sonderegger","doi":"10.1016/j.wocn.2025.101457","DOIUrl":"10.1016/j.wocn.2025.101457","url":null,"abstract":"<div><div>Native speakers imitate F0 contours that vary between two lexical tones non-linearly–they do not precisely reproduce the presented F0 features but instead cluster them toward tonal categories, the so-called contrast mediation effect. However, less is known whether non-native speakers who lack the lexical tone phonology will show linear imitation of F0 contours. Addressing this question will deepen our understanding of whether F0 imitation is solely influenced by lexical tone contrasts or also shaped by other sources of non-linearity beyond phonological contrasts. To investigate this, the current study examined the categorization and imitation of a Mandarin flat-falling tonal continuum by both Mandarin speakers and English speakers who were naïve to tonal languages. Imitation distributions were analyzed by comparing two models: a linear regression model, which assumes participants linearly track phonetic cues, and a mixture regression model, which assumes imitation reflects underlying categories. The mixture regression model fit the data better for the Mandarin speakers while the reverse was true for the English speakers, suggesting that Mandarin speakers imitated the F0 contours more categorically than English speakers. However, for both groups, the data was best fit using a weighted combination of both models. For the Mandarin group this result along with additional analyses of duration, F1 and intensity suggest that tone categories involve both phonological and phonetic information and imitation taps both, possibly via hyper- and hypo-articulation. For English participants, the evidence for categorical mediation suggests that imitation is mediated by factors other than lexically contrastive linguistic categories, although the exact nature of the factors is unclear.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101457"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.wocn.2025.101454
Constantijn Kaland
Recent work applied cluster analysis on f0 contours in order to find ‘prototypical’ or ‘underlying’ categories as assumed in intonational phonology. However, it remains to be answered to what extent meaningful f0 variation can indeed be captured using automatic classification of surface realizations. Studies on f0 dynamics have suggested that derivatives (e.g., f0 velocity, acceleration and jerk) closely approximate the meaningful components of f0. The question answered in this study is to what extent f0 derivatives are more informative for cluster analysis than other metrics, such as the (time series) f0 contour they are derived from, a static measure representing it, or other acoustic measures such as intensity and duration. This is tested across two clustering techniques (hierarchical and k-medoids) for three different meaningful features expressed in Dutch noun phrases (of the type ‘blue sofa’): focus type (broad, narrow), focus position (adjective, noun) and phrase position (medial, final). Results show that derivatives are among the most informative acoustic measures, although the best performing cluster analyses are the ones based on multiple acoustic measures. Crucially, cluster analyses reveal that the different meaningful prosodic features each have their own characteristics in terms of acoustics and number of clusters.
{"title":"F0 derivatives in the classification of meaningful tonal movements","authors":"Constantijn Kaland","doi":"10.1016/j.wocn.2025.101454","DOIUrl":"10.1016/j.wocn.2025.101454","url":null,"abstract":"<div><div>Recent work applied cluster analysis on f0 contours in order to find ‘prototypical’ or ‘underlying’ categories as assumed in intonational phonology. However, it remains to be answered to what extent meaningful f0 variation can indeed be captured using automatic classification of surface realizations. Studies on f0 dynamics have suggested that derivatives (e.g., f0 velocity, acceleration and jerk) closely approximate the meaningful components of f0. The question answered in this study is to what extent f0 derivatives are more informative for cluster analysis than other metrics, such as the (time series) f0 contour they are derived from, a static measure representing it, or other acoustic measures such as intensity and duration. This is tested across two clustering techniques (hierarchical and k-medoids) for three different meaningful features expressed in Dutch noun phrases (of the type ‘blue sofa’): focus type (broad, narrow), focus position (adjective, noun) and phrase position (medial, final). Results show that derivatives are among the most informative acoustic measures, although the best performing cluster analyses are the ones based on multiple acoustic measures. Crucially, cluster analyses reveal that the different meaningful prosodic features each have their own characteristics in terms of acoustics and number of clusters.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101454"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.wocn.2025.101460
Keke Yu , Jie Zhang , Zilong Li , Xuliang Zhang , Yiyuan He , Li Li , Ruiming Wang
Lexical tone acquisition is a significant yet challenging aspect of learning a tonal language as a second language (L2). Embodied cognition theory offers a promising perspective by highlighting the role of pitch gestures in L2 lexical tone learning. Nevertheless, how pitch gestures facilitate L2 lexical tone acquisition, particularly in relation to native language (L1)–L2 perceptual assimilation patterns remain poorly understood. The present study recruited native Mandarin speakers to learn Thai lexical tones. First, we classified different types of Mandarin-Thai tone assimilation before learning. Next, we employed tone discrimination and identification tasks to investigate how pitch gestures facilitate the learning of Thai lexical tones with varying assimilation patterns. We compared three learning approaches, pitch gesture production, pitch feature observation, and word-picture association. The results revealed three Mandarin-Thai tone assimilation patterns: the Mid and Low Thai tones were assimilated to Mandarin Tone 1; the Falling Thai tone was assimilated to Mandarin Tone 4; the High and Rising Thai tones were assimilated to Mandarin Tone 2. Notably, the pitch gesture production approach enhanced learners’ ability to discriminate between Thai tones assimilated to different Mandarin tones, and identify Thai tones assimilated to Mandarin Tone 1 (Mid/Low) and Tone 4 (Falling). These findings indicate that while embodied experience through pitch gesture production facilitates L2 lexical tone acquisition, its efficacy is modulated by L1–L2 perceptual assimilation patterns. Based on these results, we propose an embodied learning viewpoint that incorporates L1 tonal experience, offering new insights into L2 lexical tone acquisition.
{"title":"How pitch gestures facilitate L2 lexical tone learning: The role of L1–L2 perceptual assimilation in Mandarin speakers’ acquisition of Thai tones","authors":"Keke Yu , Jie Zhang , Zilong Li , Xuliang Zhang , Yiyuan He , Li Li , Ruiming Wang","doi":"10.1016/j.wocn.2025.101460","DOIUrl":"10.1016/j.wocn.2025.101460","url":null,"abstract":"<div><div>Lexical tone acquisition is a significant yet challenging aspect of learning a tonal language as a second language (L2). Embodied cognition theory offers a promising perspective by highlighting the role of pitch gestures in L2 lexical tone learning. Nevertheless, how pitch gestures facilitate L2 lexical tone acquisition, particularly in relation to native language (L1)–L2 perceptual assimilation patterns remain poorly understood. The present study recruited native Mandarin speakers to learn Thai lexical tones. First, we classified different types of Mandarin-Thai tone assimilation before learning. Next, we employed tone discrimination and identification tasks to investigate how pitch gestures facilitate the learning of Thai lexical tones with varying assimilation patterns. We compared three learning approaches, pitch gesture production, pitch feature observation, and word-picture association. The results revealed three Mandarin-Thai tone assimilation patterns: the Mid and Low Thai tones were assimilated to Mandarin Tone 1; the Falling Thai tone was assimilated to Mandarin Tone 4; the High and Rising Thai tones were assimilated to Mandarin Tone 2. Notably, the pitch gesture production approach enhanced learners’ ability to discriminate between Thai tones assimilated to different Mandarin tones, and identify Thai tones assimilated to Mandarin Tone 1 (Mid/Low) and Tone 4 (Falling). These findings indicate that while embodied experience through pitch gesture production facilitates L2 lexical tone acquisition, its efficacy is modulated by L1–L2 perceptual assimilation patterns. Based on these results, we propose an embodied learning viewpoint that incorporates L1 tonal experience, offering new insights into L2 lexical tone acquisition.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101460"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145578709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.wocn.2025.101458
Yu Yang , Longjie Dong , Quansheng Xia , Yuxiao Yang , Fei Chen
The acquisition of suprasegmental features in a second language (L2), like lexical tone and pitch accent, can be challenging, yet the impact of cross-linguistic similarity on learning these suprasegmental features has been underexplored. This study explored the role of cross-linguistic similarity in Chinese learners’ perception of Japanese pitch accents, aiming to verify the Perceptual Assimilation Model for Suprasegmentals (PAM-S). In experiment 1, 25 Chinese learners of Japanese with lower proficiency level and 24 learners with higher proficiency level completed a perceptual assimilation task (PAT) that examined the cross-linguistic perceptual similarity between Mandarin tones and Japanese pitch accents. In experiment 2, the same Chinese groups and 35 native Japanese listeners completed a perceptual discrimination test (PDT) of Japanese pitch accent contrasts. Results of PAT showed that Chinese learners successfully categorized Japanese pitch accents into their native Mandarin tone categories: they perceived Japanese H*L as Mandarin Tone 4 (falling tone), LH* as Tone 2 (rising tone), and LH as Tone 1 (level tone). Moreover, results of PDT showed that Chinese learners were able to discriminate H*L–LH* and H*L–LH but had difficulty in the discrimination of LH*–LH. The results also show that Chinese learners’ ability to discriminate Japanese pitch accent contrasts did not improve consistently with increased Japanese experience. This study argues that the LH*–LH contrast is hard for L2 learners regardless of their L2 experience, because of these two accents’ acoustic similarity. The results extended the PAM-S, suggesting that L2 speech perception could be influenced by both the L1–L2 assimilation patterns and acoustic similarity.
{"title":"Cross-linguistic similarity in L2 suprasegmental learning: evidence from Chinese learners’ perception of Japanese pitch accents","authors":"Yu Yang , Longjie Dong , Quansheng Xia , Yuxiao Yang , Fei Chen","doi":"10.1016/j.wocn.2025.101458","DOIUrl":"10.1016/j.wocn.2025.101458","url":null,"abstract":"<div><div>The acquisition of suprasegmental features in a second language (L2), like lexical tone and pitch accent, can be challenging, yet the impact of cross-linguistic similarity on learning these suprasegmental features has been underexplored. This study explored the role of cross-linguistic similarity in Chinese learners’ perception of Japanese pitch accents, aiming to verify the Perceptual Assimilation Model for Suprasegmentals (PAM-S). In experiment 1, 25 Chinese learners of Japanese with lower proficiency level and 24 learners with higher proficiency level completed a perceptual assimilation task (PAT) that examined the cross-linguistic perceptual similarity between Mandarin tones and Japanese pitch accents. In experiment 2, the same Chinese groups and 35 native Japanese listeners completed a perceptual discrimination test (PDT) of Japanese pitch accent contrasts. Results of PAT showed that Chinese learners successfully categorized Japanese pitch accents into their native Mandarin tone categories: they perceived Japanese H*L as Mandarin Tone 4 (falling tone), LH* as Tone 2 (rising tone), and LH as Tone 1 (level tone). Moreover, results of PDT showed that Chinese learners were able to discriminate H*L–LH* and H*L–LH but had difficulty in the discrimination of LH*–LH. The results also show that Chinese learners’ ability to discriminate Japanese pitch accent contrasts did not improve consistently with increased Japanese experience. This study argues that the LH*–LH contrast is hard for L2 learners regardless of their L2 experience, because of these two accents’ acoustic similarity. The results extended the PAM-S, suggesting that L2 speech perception could be influenced by both the L1–L2 assimilation patterns and acoustic similarity.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101458"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.wocn.2025.101456
Dae-yong Lee , Sahyang Kim , Taehong Cho
This study examines the articulatory patterns of Korean /w/-vowel sequences by comparing tongue dorsum movement trajectories with those of corresponding plain vowels, using Electromagnetic Articulography data from 48 speakers of Seoul and North Gyeongsang dialects. The central question is whether these sequences reflect mere coarticulation or exhibit signs of gestural restructuring in the nucleus vowel. Results reveal gradient restructuring shaped by vowel constriction degree, dialect, and gender. High vowels (/wi/-/i/) show minimal divergence, mid vowels (/we/-/e/, /wɛ/-/ɛ/) moderate divergence, and low back vowels (/wa/-/a/, /wʌ/-/ʌ/) the greatest divergence—especially in dialect- and gender-specific ways. Further analysis of the /e/-/ɛ/ merger and the recent /ʌ/-/ɨ/ split in North Gyeongsang sheds light on how vowel distinctions interact with /w/. The /we/-/wɛ/ pair shows a stronger merger than /e/-/ɛ/, supporting the view that /w/ triggers gestural restructuring of the nucleus vowel and thus plays an active role in reshaping merger trajectories. This effect is further illustrated by the /wa/-/wʌ/ and /a/-/ʌ/ contrasts, with a stronger merger in the /w/-initial context—an effect notably led by male speakers. Interestingly, North Gyeongsang males preserve the /a/-/ʌ/ contrast more robustly than the /wa/-/wʌ/ contrast, possibly due to hyperarticulation of a phonetically redefined /ʌ/ resulting from the recent /ʌ/-/ɨ/ split. These findings are interpreted within a dynamical framework of gestural blending strength (GBS), which varies by vowel constriction and coarticulatory resistance but remains stable for /w/. Overall, the results suggest that what may have begun as low-level coarticulation has evolved into systematic gestural restructuring—a gradient shift toward phonological reorganization shaped by phonetic context, sound change, and sociophonetic variation.
{"title":"Gestural restructuring beyond coarticulation in Korean /w/-vowel sequences: Evidence from phonetic, dialectal, and gender variation","authors":"Dae-yong Lee , Sahyang Kim , Taehong Cho","doi":"10.1016/j.wocn.2025.101456","DOIUrl":"10.1016/j.wocn.2025.101456","url":null,"abstract":"<div><div>This study examines the articulatory patterns of Korean /w/-vowel sequences by comparing tongue dorsum movement trajectories with those of corresponding plain vowels, using Electromagnetic Articulography data from 48 speakers of Seoul and North Gyeongsang dialects. The central question is whether these sequences reflect mere coarticulation or exhibit signs of gestural restructuring in the nucleus vowel. Results reveal gradient restructuring shaped by vowel constriction degree, dialect, and gender. High vowels (/wi/-/i/) show minimal divergence, mid vowels (/we/-/e/, /wɛ/-/ɛ/) moderate divergence, and low back vowels (/wa/-/a/, /wʌ/-/ʌ/) the greatest divergence—especially in dialect- and gender-specific ways. Further analysis of the /e/-/ɛ/ merger and the recent /ʌ/-/ɨ/ split in North Gyeongsang sheds light on how vowel distinctions interact with /w/. The /we/-/wɛ/ pair shows a stronger merger than /e/-/ɛ/, supporting the view that /w/ triggers gestural restructuring of the nucleus vowel and thus plays an active role in reshaping merger trajectories. This effect is further illustrated by the /wa/-/wʌ/ and /a/-/ʌ/ contrasts, with a stronger merger in the /w/-initial context—an effect notably led by male speakers. Interestingly, North Gyeongsang males preserve the /a/-/ʌ/ contrast more robustly than the /wa/-/wʌ/ contrast, possibly due to hyperarticulation of a phonetically redefined /ʌ/ resulting from the recent /ʌ/-/ɨ/ split. These findings are interpreted within a dynamical framework of gestural blending strength (GBS), which varies by vowel constriction and coarticulatory resistance but remains stable for /w/. Overall, the results suggest that what may have begun as low-level coarticulation has evolved into systematic gestural restructuring—a gradient shift toward phonological reorganization shaped by phonetic context, sound change, and sociophonetic variation.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"113 ","pages":"Article 101456"},"PeriodicalIF":2.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}