Pub Date : 2026-03-01Epub Date: 2026-02-02DOI: 10.1016/j.wocn.2026.101473
Hironori Katsuda, Yoonjung Kang
This study examines the sensitivity of vowels and consonants to speaking rate variations in both production and perception, using Japanese as a case study. In contrast to prior studies, which suggest that vowels are more responsive to speaking rate changes than consonants in production, our results indicate a more nuanced distinction between vowels and stops versus fricatives and nasals, with the former group exhibiting greater sensitivity to speaking rate changes. Furthermore, this production pattern was also generally reflected, though to a lesser extent, in the perception results. These findings point to the need for further research into factors such as the presence or absence of length distinctions, language-specific prosodic and rhythmic characteristics, and the relationship between the ratios of long to short segments and slow to fast speaking rates.
{"title":"Speaking rate effects on Japanese vowel and consonant length contrasts","authors":"Hironori Katsuda, Yoonjung Kang","doi":"10.1016/j.wocn.2026.101473","DOIUrl":"10.1016/j.wocn.2026.101473","url":null,"abstract":"<div><div>This study examines the sensitivity of vowels and consonants to speaking rate variations in both production and perception, using Japanese as a case study. In contrast to prior studies, which suggest that vowels are more responsive to speaking rate changes than consonants in production, our results indicate a more nuanced distinction between vowels and stops versus fricatives and nasals, with the former group exhibiting greater sensitivity to speaking rate changes. Furthermore, this production pattern was also generally reflected, though to a lesser extent, in the perception results. These findings point to the need for further research into factors such as the presence or absence of length distinctions, language-specific prosodic and rhythmic characteristics, and the relationship between the ratios of long to short segments and slow to fast speaking rates.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"115 ","pages":"Article 101473"},"PeriodicalIF":2.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-11DOI: 10.1016/j.wocn.2026.101485
Xiaojuan Zhang , Bing Cheng , Yang Zhang
The mechanisms linking speech production and perception remain underspecified, particularly in how segmental and suprasegmental features are processed across different contextual variations. This study investigated whether perceptual cue weighting could be predicted by distributional reliability of acoustic cues in production, focusing on the Mandarin Tone 2–Tone 3 contrast across both gradient coarticulatory (T1, T2, T4) and categorical T3 sandhi contexts. We quantified production distributional reliability using the Bhattacharyya coefficient and assessed perceptual cue weighting through relative weight analysis. Bayesian mixed-effects modeling showed strong evidence for context-dependent acoustic distributions in production and cue weighting in perception. Critically, production–perception coupling emerged selectively. In gradient contexts, higher production reliability strongly predicted perceptual weighting, with robust correlations for critical cues in the T2 and T4 contexts, though this pattern was less evident in the T1 context. No such coupling was observed for secondary cues across contexts. In contrast, in the categorical T3 sandhi context, production statistics did not predict perceptual weights. These findings reveal a context-sensitive production–perception relationship: tightly coupled in gradient coarticulatory contexts, but dissociated in categorical rule-governed environments. This pattern suggests that tone processing involves a dynamic interplay between bottom-up sensitivity to statistical regularities and top-down phonological constraints, rather than relying on a uniform statistical mapping mechanism.
{"title":"Context-dependent coupling and dissociation between speech production and perception in Mandarin tones","authors":"Xiaojuan Zhang , Bing Cheng , Yang Zhang","doi":"10.1016/j.wocn.2026.101485","DOIUrl":"10.1016/j.wocn.2026.101485","url":null,"abstract":"<div><div>The mechanisms linking speech production and perception remain underspecified, particularly in how segmental and suprasegmental features are processed across different contextual variations. This study investigated whether perceptual cue weighting could be predicted by distributional reliability of acoustic cues in production, focusing on the Mandarin Tone 2–Tone 3 contrast across both gradient coarticulatory (T1, T2, T4) and categorical T3 sandhi contexts. We quantified production distributional reliability using the Bhattacharyya coefficient and assessed perceptual cue weighting through relative weight analysis. Bayesian mixed-effects modeling showed strong evidence for context-dependent acoustic distributions in production and cue weighting in perception. Critically, production–perception coupling emerged selectively. In gradient contexts, higher production reliability strongly predicted perceptual weighting, with robust correlations for critical cues in the T2 and T4 contexts, though this pattern was less evident in the T1 context. No such coupling was observed for secondary cues across contexts. In contrast, in the categorical T3 sandhi context, production statistics did not predict perceptual weights. These findings reveal a context-sensitive production–perception relationship: tightly coupled in gradient coarticulatory contexts, but dissociated in categorical rule-governed environments. This pattern suggests that tone processing involves a dynamic interplay between bottom-up sensitivity to statistical regularities and top-down phonological constraints, rather than relying on a uniform statistical mapping mechanism.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"115 ","pages":"Article 101485"},"PeriodicalIF":2.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-13DOI: 10.1016/j.wocn.2026.101484
Lian J. Arzbecker, Ewa Jacewicz, Robert A. Fox
Listeners’ ability to categorize speakers based on their accents—whether regional, first- or second-language—is primarily based on sociocultural knowledge and familiarity with the phonetic features of those varieties. Here, we explore whether English listeners’ categorization decisions are also influenced by perceptual similarity of those accents to their own, and whether a quantitative pronunciation distance metric can predict accent categorization. Current pronunciation distance metrics have been restricted to segments (vowels and consonants) as they have not successfully integrated suprasegmental information (prosody and rhythm). Consequently, we examine the contributions of (1) segmental cues in unmodified speech and (2) suprasegmental information using low-pass and high-pass filtering. American English listeners responded to unmodified and filtered phrases produced by 24 speakers representing four accent varieties: American English, British/Australian English, Chinese English, and Indian English. Accent categorization accuracy was not fully consistent with predictions based on pronunciation distance, but the metric predicted confusion patterns: greater distance from listeners’ own accent decreased confusions with that accent. These findings indicate that greater perceptual distance from a native variety may facilitate adaptation to features of the distant accent, detection of systematic variation in speech of multiple speakers, and recognition of accent-general properties. For filtered speech, categorization accuracy was lower but still above chance. Across the four accent varieties, the contribution of suprasegmental information did not mirror that of segmental information. Overall, the study provides evidence that listeners’ decisions are not solely based on sociocultural knowledge and accent familiarity, but phonetic similarity also plays a role in categorization processes.
{"title":"Does pronunciation distance predict accent categorization? Evidence for the respective contributions of segment distance and suprasegmentals","authors":"Lian J. Arzbecker, Ewa Jacewicz, Robert A. Fox","doi":"10.1016/j.wocn.2026.101484","DOIUrl":"10.1016/j.wocn.2026.101484","url":null,"abstract":"<div><div>Listeners’ ability to categorize speakers based on their accents—whether regional, first- or second-language—is primarily based on sociocultural knowledge and familiarity with the phonetic features of those varieties. Here, we explore whether English listeners’ categorization decisions are also influenced by perceptual similarity of those accents to their own, and whether a quantitative pronunciation distance metric can predict accent categorization. Current pronunciation distance metrics have been restricted to segments (vowels and consonants) as they have not successfully integrated suprasegmental information (prosody and rhythm). Consequently, we examine the contributions of (1) segmental cues in unmodified speech and (2) suprasegmental information using low-pass and high-pass filtering. American English listeners responded to unmodified and filtered phrases produced by 24 speakers representing four accent varieties: American English, British/Australian English, Chinese English, and Indian English. Accent categorization accuracy was not fully consistent with predictions based on pronunciation distance, but the metric predicted confusion patterns: greater distance from listeners’ own accent decreased confusions with that accent. These findings indicate that greater perceptual distance from a native variety may facilitate adaptation to features of the distant accent, detection of systematic variation in speech of multiple speakers, and recognition of accent-general properties. For filtered speech, categorization accuracy was lower but still above chance. Across the four accent varieties, the contribution of suprasegmental information did not mirror that of segmental information. Overall, the study provides evidence that listeners’ decisions are not solely based on sociocultural knowledge and accent familiarity, but phonetic similarity also plays a role in categorization processes.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"115 ","pages":"Article 101484"},"PeriodicalIF":2.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-13DOI: 10.1016/j.wocn.2026.101483
McKinley Alden, Anja Arnhold
Impressionistic accounts of Chugach Alutiiq prosody describe a uniquely complex metrical system which builds both binary and ternary feet, is weight-sensitive, and demarcates metrical structure via stress, additional lengthening on ternary foot heads, foot-initial onset fortition, and metrically conditioned tone. They further state that compression, a length-neutralizing process, targets phonemic length and stress. Here, we present the first acoustic analysis of length and metrical expression in Alutiiq. Seven Alutiiq narratives were annotated, resulting in a data set of 2235 vowels and 2720 onset consonants. Linear mixed-effects modelling of vowel duration, intensity, and f0 revealed that the ternary distinction between unstressed short, stressed short, and obligatorily stressed long vowels is expressed via all three acoustic correlates. The exploration of onset fortition demonstrated that foot-initial onsets are fortified, as described in the literature. The acoustics of other metrical phenomena were more complex than previously described: compression was not a fully neutralizing phenomenon, so-called additional lengthening of some stressed syllables utilized f0 rather than duration, and so-called tones were not categorically distinguished by f0 as previously described. A significant difference did occur, as previously described, between two types of unstressed syllables, but this difference was durational rather than tonal. Based on these results, we conclude, first, that Alutiiq metrical prosody is indeed highly complex. Second, we suggest that these acoustic patterns are best modelled with reference to prosodic constituency and support an account using internally-layered ternary feet.
{"title":"Quantitative evidence of complex metrical prosody in Chugach Alutiiq","authors":"McKinley Alden, Anja Arnhold","doi":"10.1016/j.wocn.2026.101483","DOIUrl":"10.1016/j.wocn.2026.101483","url":null,"abstract":"<div><div>Impressionistic accounts of Chugach Alutiiq prosody describe a uniquely complex metrical system which builds both binary and ternary feet, is weight-sensitive, and demarcates metrical structure via stress, additional lengthening on ternary foot heads, foot-initial onset fortition, and metrically conditioned tone. They further state that compression, a length-neutralizing process, targets phonemic length and stress. Here, we present the first acoustic analysis of length and metrical expression in Alutiiq. Seven Alutiiq narratives were annotated, resulting in a data set of 2235 vowels and 2720 onset consonants. Linear mixed-effects modelling of vowel duration, intensity, and f0 revealed that the ternary distinction between unstressed short, stressed short, and obligatorily stressed long vowels is expressed via all three acoustic correlates. The exploration of onset fortition demonstrated that foot-initial onsets are fortified, as described in the literature. The acoustics of other metrical phenomena were more complex than previously described: compression was not a fully neutralizing phenomenon, so-called additional lengthening of some stressed syllables utilized f0 rather than duration, and so-called tones were not categorically distinguished by f0 as previously described. A significant difference did occur, as previously described, between two types of unstressed syllables, but this difference was durational rather than tonal. Based on these results, we conclude, first, that Alutiiq metrical prosody is indeed highly complex. Second, we suggest that these acoustic patterns are best modelled with reference to prosodic constituency and support an account using internally-layered ternary feet.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"115 ","pages":"Article 101483"},"PeriodicalIF":2.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-31DOI: 10.1016/j.wocn.2026.101474
Cheonkam Jeong, Andrew Wedel
In Seoul Korean, VOT is historically a primary cue distinguishing the aspiratedlenis contrast in stops, with fundamental frequency (F0) of the following vowel as a secondary cue. However, for many speakers a transphonologization is in progress in which the VOT cue progresses toward neutralization, with a concomitant expansion of the F0 contrast. Prior research shows that phonetic cues which distinguish a lexical minimal pair are hyperarticulated, and suggests that this contrastive hyperarticulation influences the trajectory of change in that phoneme contrast. Here we investigate minimal pair-associated hyperarticulaton of the VOT and F0 contrasts of aspirated/lenis stops in both a production study and a study of a corpus of natural speech. We ask (i) if we in fact find contrastive hyperarticulation of the aspiratedlenis distinction in minimal pairs, and (ii) if the degree of VOT hyperarticulation tracks the degree to which a speaker employs the VOT cue in the aspiratedlenis distinction. We find that speakers contrastively hyperarticulate both the VOT and F0 cues to the aspiratedlenis distinction, although hyperarticulation of the F0 contrast is less robust. Moreover, using a new measure of advancement in this sound change, we show that VOT still shows minimal pair-associated hyperarticulation even in advanced speakers who do not detectably use VOT in their general speech. We discuss possible explanations for why minimal pair-associated hyperarticulation of VOT appears to persist in these advanced speakers, and its implications for theories of sound change.
{"title":"The effect of contrast-specific minimal pair competitor in hyperarticulation of VOT and F0 phonetic cues in Korean initial stops in tonogenetic sound change","authors":"Cheonkam Jeong, Andrew Wedel","doi":"10.1016/j.wocn.2026.101474","DOIUrl":"10.1016/j.wocn.2026.101474","url":null,"abstract":"<div><div>In Seoul Korean, VOT is historically a primary cue distinguishing the aspirated<span><math><mrow><mo>∼</mo></mrow></math></span>lenis contrast in stops, with fundamental frequency (F0) of the following vowel as a secondary cue. However, for many speakers a transphonologization is in progress in which the VOT cue progresses toward neutralization, with a concomitant expansion of the F0 contrast. Prior research shows that phonetic cues which distinguish a lexical minimal pair are hyperarticulated, and suggests that this contrastive hyperarticulation influences the trajectory of change in that phoneme contrast. Here we investigate minimal pair-associated hyperarticulaton of the VOT and F0 contrasts of aspirated/lenis stops in both a production study and a study of a corpus of natural speech. We ask (i) if we in fact find contrastive hyperarticulation of the aspirated<span><math><mrow><mo>∼</mo></mrow></math></span>lenis distinction in minimal pairs, and (ii) if the degree of VOT hyperarticulation tracks the degree to which a speaker employs the VOT cue in the aspirated<span><math><mrow><mo>∼</mo></mrow></math></span>lenis distinction. We find that speakers contrastively hyperarticulate both the VOT and F0 cues to the aspirated<span><math><mrow><mo>∼</mo></mrow></math></span>lenis distinction, although hyperarticulation of the F0 contrast is less robust. Moreover, using a new measure of advancement in this sound change, we show that VOT still shows minimal pair-associated hyperarticulation even in advanced speakers who do not detectably use VOT in their general speech. We discuss possible explanations for why minimal pair-associated hyperarticulation of VOT appears to persist in these advanced speakers, and its implications for theories of sound change.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"115 ","pages":"Article 101474"},"PeriodicalIF":2.4,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-31DOI: 10.1016/j.wocn.2025.101461
Jungyun Seo, Ruaridh Purse, Jelena Krivokapić
This study investigates how planning and prosodic structure interact in speech production. Planning is operationalized in this study as the selection of one lexical item from two possible candidates. In an Electromagnetic Articulometry study that elicited this planning at a word or phrase boundary, two questions were examined. The first question tested whether planning has an effect on the kinematic properties of prosodic phrase boundaries. The results show that an increase in planning load does not affect the scope of boundary-related gestural lengthening but only leads to an increase in pause duration. The second question tested the effect of planning at word boundaries, specifically whether an increase in planning load at a word boundary leads to the production of a prosodic phrase boundary or just the insertion of a pause. The results show that an increase in the planning load at word boundaries leads to lengthening of gestures at a boundary and the insertion of pauses, indicating that speakers insert prosodic phrase boundaries when they need more planning time.
{"title":"The Interplay of Planning and Prosody: Investigating the Bidirectional Influences of Planning and Prosody in Speech Production","authors":"Jungyun Seo, Ruaridh Purse, Jelena Krivokapić","doi":"10.1016/j.wocn.2025.101461","DOIUrl":"10.1016/j.wocn.2025.101461","url":null,"abstract":"<div><div>This study investigates how planning and prosodic structure interact in speech production. Planning is operationalized in this study as the selection of one lexical item from two possible candidates. In an Electromagnetic Articulometry study that elicited this planning at a word or phrase boundary, two questions were examined. The first question tested whether planning has an effect on the kinematic properties of prosodic phrase boundaries. The results show that an increase in planning load does not affect the scope of boundary-related gestural lengthening but only leads to an increase in pause duration. The second question tested the effect of planning at word boundaries, specifically whether an increase in planning load at a word boundary leads to the production of a prosodic phrase boundary or just the insertion of a pause. The results show that an increase in the planning load at word boundaries leads to lengthening of gestures at a boundary and the insertion of pauses, indicating that speakers insert prosodic phrase boundaries when they need more planning time.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101461"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-23DOI: 10.1016/j.wocn.2025.101470
Taehong Cho , Sahyang Kim , Holger Mitterer
This special issue examines how fine phonetic detail participates in the shaping of sound systems. Across fourteen studies, the central theme is that subtle temporal, spectral, and articulatory patterns are not incidental by-products of articulation, but are systematically regulated aspects of speakers’ phonetic knowledge. They provide the means through which phonological contrasts and prosodic structure are realized, maintained, and sometimes reorganized. The contributions show how languages allocate continuous phonetic parameters—such as timing, coordination, voice quality, and nasality—within prosodic domains (e.g., phrases, words, and syllables) and under general biomechanical and communicative pressures. Studies of Irish, Hawaiian, Japanese, and Mandarin illustrate how prosodic structure guides segmental and suprasegmental realization. Work on English, German, Danish, and Cantonese demonstrates how fine phonetic detail underlies patterns of variation and creates potential pathways for change. Production connects naturally to perception and learning: findings from English accent adaptation and Samoan iterated learning reveal how listeners stabilize or reinterpret detail, linking individual processing to community-level patterning. A set of studies on Italian, Korean, English, and L2 German show how prominence reorganizes cues across articulation, interaction, and acquisition, shaping how speakers signal and listeners recover linguistic structure. These studies converge on a view in which fine phonetic detail arises from a central phonetic component (or the phonetic grammar) of linguistic structure—controlled by speakers, shaped by universal motor and perceptual constraints, and continually adjusted through perception and learning. In this perspective, sound systems emerge from the interplay of these regulated patterns, which sustain contrasts, support communication, and open principled routes for change.
{"title":"Linguistic and cognitive functions of fine phonetic detail underlying sound systems and sound change","authors":"Taehong Cho , Sahyang Kim , Holger Mitterer","doi":"10.1016/j.wocn.2025.101470","DOIUrl":"10.1016/j.wocn.2025.101470","url":null,"abstract":"<div><div>This special issue examines how fine phonetic detail participates in the shaping of sound systems. Across fourteen studies, the central theme is that subtle temporal, spectral, and articulatory patterns are not incidental by-products of articulation, but are systematically regulated aspects of speakers’ phonetic knowledge. They provide the means through which phonological contrasts and prosodic structure are realized, maintained, and sometimes reorganized. The contributions show how languages allocate continuous phonetic parameters—such as timing, coordination, voice quality, and nasality—within prosodic domains (e.g., phrases, words, and syllables) and under general biomechanical and communicative pressures. Studies of Irish, Hawaiian, Japanese, and Mandarin illustrate how prosodic structure guides segmental and suprasegmental realization. Work on English, German, Danish, and Cantonese demonstrates how fine phonetic detail underlies patterns of variation and creates potential pathways for change. Production connects naturally to perception and learning: findings from English accent adaptation and Samoan iterated learning reveal how listeners stabilize or reinterpret detail, linking individual processing to community-level patterning. A set of studies on Italian, Korean, English, and L2 German show how prominence reorganizes cues across articulation, interaction, and acquisition, shaping how speakers signal and listeners recover linguistic structure. These studies converge on a view in which fine phonetic detail arises from a central phonetic component (or the phonetic grammar) of linguistic structure—controlled by speakers, shaped by universal motor and perceptual constraints, and continually adjusted through perception and learning. In this perspective, sound systems emerge from the interplay of these regulated patterns, which sustain contrasts, support communication, and open principled routes for change.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101470"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intonation, the linguistic use of voice pitch, is critical for acquisition, phonetic organisation at the phrasal level and subsequent speech processing. By indicating syntactic constituency and encoding information structure and conversational implicatures, intonation is not only crucial for communication, but also for understanding grammatical phenomena such as information structure and syntax. Nevertheless, existing research has not always done justice to the roles that intonation plays in grammar, acquisition, and communication. Here, we first present an overview of what intonation is (and what it is not), and briefly discuss the essential tenets of the autosegmental-metrical (AM) theory of intonational phonology, currently the most influential and widely used approach to intonation. We critically review the standard methods of conducting intonation research within AM and present newer methodologies developed in response to shortcomings of the standard methods. We conclude by reviewing potential changes to AM in light of the findings stemming from these newer methods, followed by suggestions for future research.
{"title":"Advancements of phonetics in the 21st century: Intonation","authors":"Amalia Arvaniti , Martine Grice , Mariapaola D’Imperio","doi":"10.1016/j.wocn.2025.101459","DOIUrl":"10.1016/j.wocn.2025.101459","url":null,"abstract":"<div><div>Intonation, the linguistic use of voice pitch, is critical for acquisition, phonetic organisation at the phrasal level and subsequent speech processing. By indicating syntactic constituency and encoding information structure and conversational implicatures, intonation is not only crucial for communication, but also for understanding grammatical phenomena such as information structure and syntax. Nevertheless, existing research has not always done justice to the roles that intonation plays in grammar, acquisition, and communication. Here, we first present an overview of what intonation is (and what it is not), and briefly discuss the essential tenets of the autosegmental-metrical (AM) theory of intonational phonology, currently the most influential and widely used approach to intonation. We critically review the standard methods of conducting intonation research within AM and present newer methodologies developed in response to shortcomings of the standard methods. We conclude by reviewing potential changes to AM in light of the findings stemming from these newer methods, followed by suggestions for future research.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101459"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145885493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-26DOI: 10.1016/j.wocn.2025.101471
Hicham Adem
Previous research on Arabic-speaking learners of English (L2) has often focused on adults and relied on static formant measurements, underscoring the need for dynamic approaches that better capture the temporal characteristics of L2 vowel production. Grounded in the revised Speech Learning Model (SLM-r), this study analyzes the formant trajectories of seven English vowels in CVC contexts, produced by 65 adolescent female learners whose L1 is Palestinian Arabic. Static vowel targets were descriptively compared to normative values from a corpus of L1 English speakers (N = 436; Lee et al., 1999), with age, gender, and CVC context matched across datasets; dynamic analyses were conducted on the L2 learner data only. Formant trajectories were modeled as multidimensional time series to assess Vowel-Inherent Spectral Change (VISC), using three trajectory-based measures—Vector Length (VL), Trajectory Length (TL), and Formant Velocity (FV)—alongside angular displacement as an additional cue. A multi-method approach combining Linear Mixed-Effects modeling (LME), Functional Principal Component Analysis (fPCA), and trajectory analysis identified group-level patterns of vowel-space compression, slower formant transitions, and reduced spectral–temporal modulation. fPCA showed joint reductions in F1 and F2 dynamics, reflecting a centralized, flattened vowel space with limited temporal modulation and diminished contrast. These outcomes suggest system-level reorganization of the L2 vowel space, shaped by L1-based constraints. Trajectory orientation emerged as a secondary cue when integrated with other time-varying measures, with directional characteristics providing limited but detectable acoustic distinctiveness. Classification models comparing dynamic, static, combined, and direction-only features showed that time-resolved spectral cues most consistently supported within-speaker classification and were more sensitive to spectral variation. Trajectory and fPCA analyses extend the empirical scope of the SLM-r by highlighting that equivalence classification may involve not only reduced acoustic category separation but also a reorganization of the spectral–temporal geometry of the L2 vowel system. These findings offer a signal-level perspective on vowel restructuring as it unfolds acoustically in underrepresented L2 learners and broaden our understanding of the acoustic structure of L2 vowels across diverse learning environments.
先前对说阿拉伯语的英语学习者(L2)的研究通常集中在成年人身上,并依赖于静态的形成峰测量,强调需要动态的方法来更好地捕捉L2元音产生的时间特征。本研究以修正后的言语学习模型(SLM-r)为基础,分析了65名母语为巴勒斯坦阿拉伯语的青春期女性学习者在CVC语境中7个英语元音的形成轨迹。静态元音目标与来自母语英语者语料库的规范值进行描述性比较(N = 436; Lee et al., 1999),年龄、性别和CVC上下文在数据集之间匹配;仅对第二语言学习者的数据进行动态分析。形成峰轨迹被建模为多维时间序列,以评估元音固有谱变化(VISC),使用三种基于轨迹的测量-矢量长度(VL),轨迹长度(TL)和形成峰速度(FV) -以及角位移作为额外线索。结合线性混合效应建模(LME)、功能主成分分析(fPCA)和轨迹分析的多方法方法确定了元音空间压缩、较慢的形成峰转换和减少的频谱时间调制的群体水平模式。fPCA表现出F1和F2动态的联合减弱,反映了集中、平坦的元音空间,时间调制有限,对比度减弱。这些结果表明L2元音空间的系统级重组是由基于l1的约束形成的。当与其他时变测量相结合时,轨迹方向成为次要线索,方向特征提供有限但可检测的声学特征。对比动态、静态、组合和方向特征的分类模型表明,时间分辨光谱线索最一致地支持说话人内部分类,并且对光谱变化更敏感。轨迹和fPCA分析扩展了SLM-r的经验范围,强调等效分类可能不仅涉及减少声学类别分离,还涉及L2元音系统的频谱-时间几何结构的重组。这些发现提供了一个信号层面的元音重组视角,因为它在代表性不足的二语学习者中展现了声学特征,并拓宽了我们对不同学习环境中二语元音声学结构的理解。
{"title":"Dynamic formant trajectories in L2 English: evidence from Arabic-speaking adolescent learners","authors":"Hicham Adem","doi":"10.1016/j.wocn.2025.101471","DOIUrl":"10.1016/j.wocn.2025.101471","url":null,"abstract":"<div><div>Previous research on Arabic-speaking learners of English (L2) has often focused on adults and relied on static formant measurements, underscoring the need for dynamic approaches that better capture the temporal characteristics of L2 vowel production. Grounded in the revised Speech Learning Model (SLM-r), this study analyzes the formant trajectories of seven English vowels in CVC contexts, produced by 65 adolescent female learners whose L1 is Palestinian Arabic. Static vowel targets were descriptively compared to normative values from a corpus of L1 English speakers (N = 436; Lee et al., 1999), with age, gender, and CVC context matched across datasets; dynamic analyses were conducted on the L2 learner data only. Formant trajectories were modeled as multidimensional time series to assess Vowel-Inherent Spectral Change (VISC), using three trajectory-based measures—Vector Length (VL), Trajectory Length (TL), and Formant Velocity (FV)—alongside angular displacement as an additional cue. A multi-method approach combining Linear Mixed-Effects modeling (LME), Functional Principal Component Analysis (fPCA), and trajectory analysis identified group-level patterns of vowel-space compression, slower formant transitions, and reduced spectral–temporal modulation. fPCA showed joint reductions in F1 and F2 dynamics, reflecting a centralized, flattened vowel space with limited temporal modulation and diminished contrast. These outcomes suggest system-level reorganization of the L2 vowel space, shaped by L1-based constraints. Trajectory orientation emerged as a secondary cue when integrated with other time-varying measures, with directional characteristics providing limited but detectable acoustic distinctiveness. Classification models comparing dynamic, static, combined, and direction-only features showed that time-resolved spectral cues most consistently supported within-speaker classification and were more sensitive to spectral variation. Trajectory and fPCA analyses extend the empirical scope of the SLM-r by highlighting that equivalence classification may involve not only reduced acoustic category separation but also a reorganization of the spectral–temporal geometry of the L2 vowel system. These findings offer a signal-level perspective on vowel restructuring as it unfolds acoustically in underrepresented L2 learners and broaden our understanding of the acoustic structure of L2 vowels across diverse learning environments.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101471"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145842578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-07DOI: 10.1016/j.wocn.2025.101472
Miao Zhang , Shuxiao Gong , Chengyu Guo
This study investigates tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang, a Sinitic language with six lexical tones in Hunan Province, China. Changsha Xiang exhibits two distinct tone sandhi patterns: right-dominant and left-dominant. Previous descriptions of these patterns have been largely impressionistic, lacking an instrumental phonetic analysis. Building on data from 16 native speakers, this study provides an acoustic analysis of the tonal realization, coarticulation, and phonetic reduction patterns in Changsha Xiang. Our results indicate that in right-dominant sequences, the non-dominant syllable exhibits reduced F0 excursion without neutralization, whereas in left-dominant sequences, the non-dominant syllable undergoes paradigmatic neutralization to four short level tones. Additionally, the study finds that the tone sandhi pattern affects tonal coarticulation, with a larger carryover effect in non-dominant final syllables. These findings contribute to a broader understanding of tonal processes in Sinitic languages and highlight parallels between tone sandhi and stress-tone interactions in other languages.
{"title":"Tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang","authors":"Miao Zhang , Shuxiao Gong , Chengyu Guo","doi":"10.1016/j.wocn.2025.101472","DOIUrl":"10.1016/j.wocn.2025.101472","url":null,"abstract":"<div><div>This study investigates tone sandhi and tonal coarticulation in disyllabic sequences in Changsha Xiang, a Sinitic language with six lexical tones in Hunan Province, China. Changsha Xiang exhibits two distinct tone sandhi patterns: right-dominant and left-dominant. Previous descriptions of these patterns have been largely impressionistic, lacking an instrumental phonetic analysis. Building on data from 16 native speakers, this study provides an acoustic analysis of the tonal realization, coarticulation, and phonetic reduction patterns in Changsha Xiang. Our results indicate that in right-dominant sequences, the non-dominant syllable exhibits reduced F0 excursion without neutralization, whereas in left-dominant sequences, the non-dominant syllable undergoes paradigmatic neutralization to four short level tones. Additionally, the study finds that the tone sandhi pattern affects tonal coarticulation, with a larger carryover effect in non-dominant final syllables. These findings contribute to a broader understanding of tonal processes in Sinitic languages and highlight parallels between tone sandhi and stress-tone interactions in other languages.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"114 ","pages":"Article 101472"},"PeriodicalIF":2.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}