Pub Date : 2025-07-15DOI: 10.1016/j.wocn.2025.101433
Bihua Chen , Isabelle Darcy
In casual speech, reduction of segments or even syllables is common. Native (L1) listeners recover these reduced forms by recruiting not only semantic and syntactic but also fine-grained acoustic cues in the surrounding utterance. Whether second-language (L2) listeners exploit the same constellation of cues is still poorly understood. We therefore compared 21 L1 English listeners and 21 Mandarin learners of English as they identified reduced targets (e.g., /tuɪnə/ ‘too into’) presented in one of three contexts: Isolation, Textual (orthographic context only), and Auditory (orthography plus the original carrier sentence). Accuracy patterns revealed a graded facilitation hierarchy. For L2 listeners, semantic-syntactic information alone (Textual) boosted recognition relative to Isolation, and adding acoustic context produced a further significant gain. Nevertheless, both effects were smaller for L2 than for L1 listeners, indicating less effective contextual integration in the L2 processing mechanism. The findings refine accounts of reduced-speech perception by showing that L2 listeners can harness acoustic context, but less efficiently than L1 listeners.
在随意的讲话中,省略音段甚至音节是很常见的。母语(L1)听者不仅通过收集周围话语中的语义和句法线索,而且还通过收集细粒度的声音线索来恢复这些简化形式。第二语言(L2)的听者是否也利用了同样的线索群,我们仍然知之甚少。因此,我们比较了21名母语英语听众和21名普通话英语学习者,因为他们识别了在三种语境中出现的减少目标(例如,/tu / n / / ' too into '):孤立、文本(仅限正字法语境)和听觉(正字法加上原始载体句)。准确度模式显示了一个分级的促进层次。对于二语听者来说,单独的语义句法信息(文本信息)相对于孤立信息提高了识别能力,而添加声学上下文则产生了进一步的显著增益。然而,这两种影响对于第二语言的听者来说都比母语听者要小,这表明第二语言加工机制中的语境整合效果较差。研究结果表明,第二语言的听者可以利用声音环境,但效率不如第一语言的听者,从而完善了对言语感知减少的解释。
{"title":"Effects of sentential context on nonnative recognition of reduced speech: Does meaning explain it all?","authors":"Bihua Chen , Isabelle Darcy","doi":"10.1016/j.wocn.2025.101433","DOIUrl":"10.1016/j.wocn.2025.101433","url":null,"abstract":"<div><div>In casual speech, reduction of segments or even syllables is common. Native (L1) listeners recover these reduced forms by recruiting not only semantic and syntactic but also fine-grained acoustic cues in the surrounding utterance. Whether second-language (L2) listeners exploit the same constellation of cues is still poorly understood. We therefore compared 21 L1 English listeners and 21 Mandarin learners of English as they identified reduced targets (e.g., /tuɪnə/ ‘too into’) presented in one of three contexts: Isolation, Textual (orthographic context only), and Auditory (orthography plus the original carrier sentence). Accuracy patterns revealed a graded facilitation hierarchy. For L2 listeners, semantic-syntactic information alone (Textual) boosted recognition relative to Isolation, and adding acoustic context produced a further significant gain. Nevertheless, both effects were smaller for L2 than for L1 listeners, indicating less effective contextual integration in the L2 processing mechanism. The findings refine accounts of reduced-speech perception by showing that L2 listeners can harness acoustic context, but less efficiently than L1 listeners.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101433"},"PeriodicalIF":1.9,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144632575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-12DOI: 10.1016/j.wocn.2025.101428
Yuyu Zeng , Chang Wang , Jie Zhang
Incomplete neutralization occurs when two underlying contrastive sounds are phonologically neutralized but remain phonetically distinct (e.g., “latter” and “ladder” become homophonous when the intervocalic stops are flapped in American English). Its proper understanding is foundational to phonology and speech production. Using the incomplete neutralization of the Mandarin 3rd tone sandhi as a test case (T3 + T3 → T2 + T3), we confirmed the presence of this incomplete neutralization with generalized additive modeling (GAMM) and growth curve analysis (GCA). Crucially, we found that the two tones (T2 and T3) became more neutralized when speakers were additionally required to perform a concurrent verbal working memory task while speaking; similar patterns were found when pseudowords were tested, although the overall effects were weaker. Since the concurrent verbal working memory task is expected to add processing load and decrease cascading activation in the spoken word production process, our results suggest that cascading activation, which permits upstream distinctions to surface in downstream acoustics, drives incomplete neutralization. Our study shows how embracing cascading activation can inform the long-standing debate between discrete vs. exemplar representations/operations surrounding incomplete neutralization. How cascading activation is compatible with the core assumptions of generative phonology is also discussed.
{"title":"Cascading activation in spoken word production drives incomplete neutralization: An internet-based study of Mandarin 3rd tone sandhi","authors":"Yuyu Zeng , Chang Wang , Jie Zhang","doi":"10.1016/j.wocn.2025.101428","DOIUrl":"10.1016/j.wocn.2025.101428","url":null,"abstract":"<div><div>Incomplete neutralization occurs when two underlying contrastive sounds are phonologically neutralized but remain phonetically distinct (e.g., “latter” and “ladder” become homophonous when the intervocalic stops are flapped in American English). Its proper understanding is foundational to phonology and speech production. Using the incomplete neutralization of the Mandarin 3rd tone sandhi as a test case (T3 + T3 → T2 + T3), we confirmed the presence of this incomplete neutralization with generalized additive modeling (GAMM) and growth curve analysis (GCA). Crucially, we found that the two tones (T2 and T3) became more neutralized when speakers were additionally required to perform a concurrent verbal working memory task while speaking; similar patterns were found when pseudowords were tested, although the overall effects were weaker. Since the concurrent verbal working memory task is expected to add processing load and decrease cascading activation in the spoken word production process, our results suggest that cascading activation, which permits upstream distinctions to surface in downstream acoustics, drives incomplete neutralization. Our study shows how embracing cascading activation can inform the long-standing debate between discrete vs. exemplar representations/operations surrounding incomplete neutralization. How cascading activation is compatible with the core assumptions of generative phonology is also discussed.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101428"},"PeriodicalIF":1.9,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144605212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1016/j.wocn.2025.101429
Anna Balas , Krzysztof Hwaszcz , Kamil Kaźmierski , Magdalena Wrembel
This paper investigates the perceived cross-linguistic similarity of retroflexes from a broad multilingual perspective by employing trilingual and bilingual learners and native users as three distinct listener groups. Previous research has demonstrated that L2 learners rely on their L1 in non-native speech perception. However, no study has examined how L3 learners perceive differences between retroflex sounds in their L1, L2, and L3. In a series of three parallel studies, we examined cross-linguistic similarity of Norwegian retroflexes and similar retroflex and non-retroflex sounds by trilingual (L1 Polish, L2 English and L3 Norwegian), bilingual (L1 Polish, L2 English) and Norwegian control (L1 Norwegian, L2 English) listeners. The listeners assessed similarity between the Norwegian and Polish or English sounds primarily based on the place and manner of articulation rather than retroflexion. The results, where condition specifies the presence or absence of agreement in terms of retroflexion and place/manner of articulation, demonstrated that all the two-way interactions: condition:language, condition:group, language:group and the three-way interaction were significant. The study revealed that experience with a given language did not influence similarity ratings in a wholesale manner but rather in a precise manner related to the presence or absence of retroflexion. The results also showed that the perceived cross-linguistic similarity by multilinguals was gradient in nature. The study provides new insights into research on the perception of retroflexes and multilingual perception by participants differing in the amount of experience with the languages of the stimuli: from L1 controls through L2 and L3 learners to naïve listeners.
{"title":"Perceived cross-linguistic similarity of retroflexes in trilingual, bilingual and native listener groups","authors":"Anna Balas , Krzysztof Hwaszcz , Kamil Kaźmierski , Magdalena Wrembel","doi":"10.1016/j.wocn.2025.101429","DOIUrl":"10.1016/j.wocn.2025.101429","url":null,"abstract":"<div><div>This paper investigates the perceived cross-linguistic similarity of retroflexes from a broad multilingual perspective by employing trilingual and bilingual learners and native users as three distinct listener groups. Previous research has demonstrated that L2 learners rely on their L1 in non-native speech perception. However, no study has examined how L3 learners perceive differences between retroflex sounds in their L1, L2, and L3. In a series of three parallel studies, we examined cross-linguistic similarity of Norwegian retroflexes and similar retroflex and non-retroflex sounds by trilingual (L1 Polish, L2 English and L3 Norwegian), bilingual (L1 Polish, L2 English) and Norwegian control (L1 Norwegian, L2 English) listeners. The listeners assessed similarity between the Norwegian and Polish or English sounds primarily based on the place and manner of articulation rather than retroflexion. The results, where condition specifies the presence or absence of agreement in terms of retroflexion and place/manner of articulation, demonstrated that all the two-way interactions: condition:language, condition:group, language:group and the three-way interaction were significant. The study revealed that experience with a given language did not influence similarity ratings in a wholesale manner but rather in a precise manner related to the presence or absence of retroflexion. The results also showed that the perceived cross-linguistic similarity by multilinguals was gradient in nature. The study provides new insights into research on the perception of retroflexes and multilingual perception by participants differing in the amount of experience with the languages of the stimuli: from L1 controls through L2 and L3 learners to naïve listeners.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101429"},"PeriodicalIF":1.9,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144605211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1016/j.wocn.2025.101430
Miquel Simonet , Marta Ramírez Martínez , Francesc Torres-Tamarit
This study explores the acoustics of velar palatalization in two subvarieties of Majorcan Catalan, Manacor (palatalizing) and Artà (nonpalatalizing). Three production studies are reported: i) a study of /k/-fronting in the context of front, central, and back vowels; ii) a study of /a/-fronting in the context of /k/ and /p/; and iii) a study of /k/-fronting in various vowel contexts in the participants’ L2, Spanish. First, while we captured /k/-fronting in the progression /o/ > /a ə/ > /i/ in both subvarieties, effect sizes were much larger in Manacor than in Artà. There were no group differences in the acoustics of /k/ in the context of the back vowel, but there were large differences in the other vowel contexts, particularly before the central vowels. We postulate that, whereas the degree of palatalization found in Artà may result from universal coarticulatory principles, palatalization in Manacor results from speaker-controlled phonetic behavior: enhanced coarticulation. Second, we found that in Manacor (but not Artà) /a/ was more fronted when it followed /k/ that when it followed /p/. We suggest that the /a/-fronting pattern found in Manacor results from the influence of its velar-palatalization process and not vice versa. Finally, we found that the enhanced velar-palatalization process in the Manacor sample was not transferred to their L2. We discuss the implications of our conclusion for our understanding of the diachrony of velar palatalization in Romance.
{"title":"Velar palatalization, phonologization, and sound change – A comparative acoustic study of /k/-fronting in Majorcan Catalan","authors":"Miquel Simonet , Marta Ramírez Martínez , Francesc Torres-Tamarit","doi":"10.1016/j.wocn.2025.101430","DOIUrl":"10.1016/j.wocn.2025.101430","url":null,"abstract":"<div><div>This study explores the acoustics of velar palatalization in two subvarieties of Majorcan Catalan, Manacor (palatalizing) and Artà (nonpalatalizing). Three production studies are reported: i) a study of /k/-fronting in the context of front, central, and back vowels; ii) a study of /a/-fronting in the context of /k/ and /p/; and iii) a study of /k/-fronting in various vowel contexts in the participants’ L2, Spanish. First, while we captured /k/-fronting in the progression /o/ > /a ə/ > /i/ in both subvarieties, effect sizes were much larger in Manacor than in Artà. There were no group differences in the acoustics of /k/ in the context of the back vowel, but there were large differences in the other vowel contexts, particularly before the central vowels. We postulate that, whereas the degree of palatalization found in Artà may result from universal coarticulatory principles, palatalization in Manacor results from speaker-controlled phonetic behavior: enhanced coarticulation. Second, we found that in Manacor (but not Artà) /a/ was more fronted when it followed /k/ that when it followed /p/. We suggest that the /a/-fronting pattern found in Manacor results from the influence of its velar-palatalization process and not <em>vice versa</em>. Finally, we found that the enhanced velar-palatalization process in the Manacor sample was not transferred to their L2. We discuss the implications of our conclusion for our understanding of the diachrony of velar palatalization in Romance.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101430"},"PeriodicalIF":1.9,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1016/j.wocn.2025.101431
Jeanne Brown , Morgan Sonderegger
This paper examines the acoustic correlates of creaky voice across language, gender and year of birth to investigate 1) the reliability of cross-linguistic differences in voice quality, 2) the direction and extent of gender differences with respect to creaky voice, and 3) the existence of an ongoing sound change targeting voice quality. Spontaneous speech from 49 Canadian English-French bilingual speakers was collected from publicly available online data sources. This corpus was processed and a range of acoustic measures of voice quality extracted using an automated pipeline with manual checks. Results do not show strong nor consistent evidence for cross-linguistic differences in creak. Regarding gender, men’s voices are unequivocally creakier, indicated by more unreliable f0 tracks, lower H1*–H2*, lower CPP and lower HNR < 500 Hz. As for age, results generally show more creak for older speakers, CPP and HNR < 500 Hz values increasing with YOB while other acoustic measures show no significant differences, suggesting that these effects are more likely due to vocal aging than sound change in progress. Contrary to popular perception and recent work claiming that young women are leaders in creaky voice use, this study finds that acoustic correlates of creak show the exact opposite: men’s voices are creakier and if anything, younger speakers are less creaky. Possible reasons for this discrepancy, reviewing recent perceptual work on creaky voice, are discussed.
{"title":"A sociophonetic study of creaky voice across language, gender and age in Canadian English-French bilinguals","authors":"Jeanne Brown , Morgan Sonderegger","doi":"10.1016/j.wocn.2025.101431","DOIUrl":"10.1016/j.wocn.2025.101431","url":null,"abstract":"<div><div>This paper examines the acoustic correlates of creaky voice across language, gender and year of birth to investigate 1) the reliability of cross-linguistic differences in voice quality, 2) the direction and extent of gender differences with respect to creaky voice, and 3) the existence of an ongoing sound change targeting voice quality. Spontaneous speech from 49 Canadian English-French bilingual speakers was collected from publicly available online data sources. This corpus was processed and a range of acoustic measures of voice quality extracted using an automated pipeline with manual checks. Results do not show strong nor consistent evidence for cross-linguistic differences in creak. Regarding gender, men’s voices are unequivocally creakier, indicated by more unreliable f0 tracks, lower H1*–H2*, lower CPP and lower HNR < 500 Hz. As for age, results generally show more creak for older speakers, CPP and HNR < 500 Hz values increasing with YOB while other acoustic measures show no significant differences, suggesting that these effects are more likely due to vocal aging than sound change in progress. Contrary to popular perception and recent work claiming that young women are leaders in creaky voice use, this study finds that acoustic correlates of creak show the exact opposite: men’s voices are creakier and if anything, younger speakers are less creaky. Possible reasons for this discrepancy, reviewing recent perceptual work on creaky voice, are discussed.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101431"},"PeriodicalIF":1.9,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-18DOI: 10.1016/j.wocn.2025.101427
Michael C. Stern, Jason A. Shaw
We investigate the dynamics of labial constriction trajectories during the production of /b/ and /m/ in English and Mandarin in two prosodic contexts. We find that, across languages and contexts, the ratio of instantaneous displacement to instantaneous velocity generally follows an exponential decay curve from movement onset to movement offset. We formalize this empirical discovery in a differential equation and, in combination with an assumption of point attractor dynamics, derive a nonlinear second-order dynamical system describing labial constriction trajectories. The equation has only two parameters, and . corresponds to the target state and corresponds to movement rapidity. Thus, each of the parameters corresponds to a phonetically relevant dimension of control. Nonlinear regression demonstrates that the model provides excellent fits to individual movement trajectories. Moreover, trajectories simulated from the model qualitatively match empirical trajectories, and capture key kinematic variables like duration and peak velocity. The model constitutes a proposal for the dynamics of individual articulatory movements, and thus offers a novel foundation from which to understand additional influences on articulatory kinematics like prosody, inter-movement coordination, and stochastic noise.
{"title":"Nonlinear second-order dynamics describe labial constriction trajectories across languages and contexts","authors":"Michael C. Stern, Jason A. Shaw","doi":"10.1016/j.wocn.2025.101427","DOIUrl":"10.1016/j.wocn.2025.101427","url":null,"abstract":"<div><div>We investigate the dynamics of labial constriction trajectories during the production of /b/ and /m/ in English and Mandarin in two prosodic contexts. We find that, across languages and contexts, the ratio of instantaneous displacement to instantaneous velocity generally follows an exponential decay curve from movement onset to movement offset. We formalize this empirical discovery in a differential equation and, in combination with an assumption of point attractor dynamics, derive a nonlinear second-order dynamical system describing labial constriction trajectories. The equation has only two parameters, <span><math><mrow><mi>T</mi></mrow></math></span> and <span><math><mrow><mi>r</mi></mrow></math></span>. <span><math><mrow><mi>T</mi></mrow></math></span> corresponds to the target state and <span><math><mrow><mi>r</mi></mrow></math></span> corresponds to movement rapidity. Thus, each of the parameters corresponds to a phonetically relevant dimension of control. Nonlinear regression demonstrates that the model provides excellent fits to individual movement trajectories. Moreover, trajectories simulated from the model qualitatively match empirical trajectories, and capture key kinematic variables like duration and peak velocity. The model constitutes a proposal for the dynamics of individual articulatory movements, and thus offers a novel foundation from which to understand additional influences on articulatory kinematics like prosody, inter-movement coordination, and stochastic noise.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"111 ","pages":"Article 101427"},"PeriodicalIF":1.9,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-18DOI: 10.1016/j.wocn.2025.101426
Taehong Cho
This review, part of the journal’s special collection on Advancements of Phonetics in the 21st Century, examines the interplay between phonetic universals and language variation at both segmental and utterance levels. It traces the physiological and biomechanical foundations of phonetic universals established by 20th-century research while focusing on cross-linguistic variation explored predominantly in 21st-century research. Segmental phonetic universals include the role of the syllable in organizing segments and gestures, intrinsic vowel duration influenced by vowel height, extrinsic vowel duration due to coda voicing, intrinsic and co-intrinsic f0 variation affected by vowel height and onset consonant characteristics, respectively, and place effects on closure duration and VOT. While segmental universals stem from distinct mechanical bases, utterance-level universals emerge from respiratory and articulatory resets at utterance onset, shaping the entire speech production system—a perspective substantiated here based primarily on 21st-century phonetic research. These resets structure prosodic organization, leading to weakening effects at the right edge (e.g., f0 declination, articulatory declination, phrase-final lengthening) and strengthening effects at the left edge (e.g., domain-initial strengthening) and occasionally at the right edge as well (e.g., phrase-final strengthening) when sufficient time permits. Extensive evidence demonstrates that phonetic universals are further shaped by language-specific factors and the interaction between system-oriented and output-oriented constraints. This diversity calls for detailed phonetic descriptions tailored to each language, with phonetic grammar, as proposed here, fine-tuning phonetic realization accordingly. Research in the 21st century has also illuminated that segmental and utterance-level universals, traditionally regarded as distinct, are deeply interconnected, if not inseparable. The Extended Model of Phonetic Grammar is introduced as a framework for mediating this relationship within the phonetics-prosody interface as well as interactions with other higher-order linguistic structures. Furthermore, language variation within phonetic universals suggests that many phonetic processes, once considered automatic, are actively controlled by speakers, reflecting the unique evolutionary pathways of different languages.
{"title":"Advancements of phonetics in the 21st century: Phonetic universals, language variation, and phonetic grammar","authors":"Taehong Cho","doi":"10.1016/j.wocn.2025.101426","DOIUrl":"10.1016/j.wocn.2025.101426","url":null,"abstract":"<div><div>This review, part of the journal’s special collection on <em>Advancements</em> of <em>Phonetics in the 21st Century</em>, examines the interplay between phonetic universals and language variation at both segmental and utterance levels. It traces the physiological and biomechanical foundations of phonetic universals established by 20th-century research while focusing on cross-linguistic variation explored predominantly in 21st-century research. Segmental phonetic universals include <em>the role of the syllable</em> in organizing segments and gestures, <em>intrinsic vowel duration</em> influenced by vowel height, <em>extrinsic vowel duration</em> due to coda voicing, <em>intrinsic and co-intrinsic f0 variation</em> affected by vowel height and onset consonant characteristics, respectively, and <em>place effects on closure duration and VOT</em>. While segmental universals stem from distinct mechanical bases, utterance-level universals emerge from respiratory and articulatory resets at utterance onset, shaping the entire speech production system—a perspective substantiated here based primarily on 21st-century phonetic research. These resets structure prosodic organization, leading to weakening effects at the right edge (e.g., <em>f0 declination, articulatory declination, phrase-final lengthening</em>) and strengthening effects at the left edge (e.g., <em>domain-initial strengthening</em>) and occasionally at the right edge as well (e.g., <em>phrase-final strengthening</em>) when sufficient time permits. Extensive evidence demonstrates that phonetic universals are further shaped by language-specific factors and the interaction between <em>system-oriented</em> and <em>output-oriented</em> constraints. This diversity calls for detailed phonetic descriptions tailored to each language, with phonetic grammar, as proposed here, fine-tuning phonetic realization accordingly. Research in the 21st century has also illuminated that segmental and utterance-level universals, traditionally regarded as distinct, are deeply interconnected, if not inseparable. <em>The Extended Model of Phonetic Grammar</em> is introduced as a framework for mediating this relationship within the phonetics-prosody interface as well as interactions with other higher-order linguistic structures. Furthermore, language variation within phonetic universals suggests that many phonetic processes, once considered automatic, are actively controlled by speakers, reflecting the unique evolutionary pathways of different languages.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"111 ","pages":"Article 101426"},"PeriodicalIF":1.9,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144307384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-14DOI: 10.1016/j.wocn.2025.101425
Elizabeth K. Johnson , Katherine S. White
Infant speech perception emerged as a field late in the 20th century. Early work focused on defining the initial state, and documenting the timecourse of changes in speech perception over the first year of life. At the turn of the century, attention shifted from studying when children became attuned to their native language, to asking how children achieved this transformation. Statistical learning became the dominant mechanism to explain language development. But, as researchers pushed the bounds of statistical learning, different questions took center stage: given the complexity of spoken language, how do infants determine which regularities to track? And are the patterns infants track influenced by their unique language learning environment? Inspired by these questions, researchers have shifted to studying acquisition across more diverse contexts, and to using dense corpora and big data approaches to examine how individual differences in children’s input relate to speech perception in the lab. In this paper, we first review this progression, summarizing how the field has arrived at the current state of the art. We then argue that the time is ripe for the development of new theoretical approaches, and sketch out the loose contours of SLED, a new 21st-century proposal that emphasizes the role of sociophonetic variation and the richness of the speech signal in early development. With advanced tools in hand and data from a wide variety of learning contexts increasingly available, we are excited to see how the field will evolve over the next 25 years.
{"title":"Advancements in phonetics in the 21st century: Infant speech development","authors":"Elizabeth K. Johnson , Katherine S. White","doi":"10.1016/j.wocn.2025.101425","DOIUrl":"10.1016/j.wocn.2025.101425","url":null,"abstract":"<div><div>Infant speech perception emerged as a field late in the 20th century. Early work focused on defining the initial state, and documenting the timecourse of changes in speech perception over the first year of life. At the turn of the century, attention shifted from studying <em>when</em> children became attuned to their native language, to asking <em>how</em> children achieved this transformation. Statistical learning became the dominant mechanism to explain language development. But, as researchers pushed the bounds of statistical learning, different questions took center stage: given the complexity of spoken language, how do infants determine which regularities to track? And are the patterns infants track influenced by their unique language learning environment? Inspired by these questions, researchers have shifted to studying acquisition across more diverse contexts, and to using dense corpora and big data approaches to examine how individual differences in children’s input relate to speech perception in the lab. In this paper, we first review this progression, summarizing how the field has arrived at the current state of the art. We then argue that the time is ripe for the development of new theoretical approaches, and sketch out the loose contours of SLED, a new 21st-century proposal that emphasizes the role of sociophonetic variation and the richness of the speech signal in early development. With advanced tools in hand and data from a wide variety of learning contexts increasingly available, we are excited to see how the field will evolve over the next 25 years.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"111 ","pages":"Article 101425"},"PeriodicalIF":1.9,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-22DOI: 10.1016/j.wocn.2025.101418
Ji Young Kim
Spanish has many minimal stress pairs, and lexical stress in Spanish is marked primarily via suprasegmental cues. Thus, sensitivity to suprasegmental information is crucial for spoken-word identification in Spanish. Using stimuli produced by speakers of Mexican Spanish with varying language learning experience (i.e., monolingual speakers, heritage speakers, L2 learners), this study examines native listeners’ identification of Spanish lexical stress under enhanced variability in phonetic cues. Our data demonstrate that listeners exploit various stress correlates in the speech signal and assign different weights to them, which is context-specific; when there is a pitch accent, native listeners mainly attend to f0-related cues, whereas in the absence of a pitch accent, intensity plays a stronger role. Our data also show that clustering based on stress correlates is not consistent with the predetermined monolingual-heritage-L2 group division, which indicates that language learning experience alone is not sufficient to explain how Spanish speakers mark stress. This study highlights the importance of incorporating variable speech data in speech perception research and adopting a data-driven, individual-centered approach to speaker grouping in cross-sectional studies.
{"title":"Relative importance of stress correlates in native listeners’ identification of Spanish lexical stress produced by monolingual and bilingual speakers","authors":"Ji Young Kim","doi":"10.1016/j.wocn.2025.101418","DOIUrl":"10.1016/j.wocn.2025.101418","url":null,"abstract":"<div><div>Spanish has many minimal stress pairs, and lexical stress in Spanish is marked primarily via suprasegmental cues. Thus, sensitivity to suprasegmental information is crucial for spoken-word identification in Spanish. Using stimuli produced by speakers of Mexican Spanish with varying language learning experience (i.e., monolingual speakers, heritage speakers, L2 learners), this study examines native listeners’ identification of Spanish lexical stress under enhanced variability in phonetic cues. Our data demonstrate that listeners exploit various stress correlates in the speech signal and assign different weights to them, which is context-specific; when there is a pitch accent, native listeners mainly attend to f0-related cues, whereas in the absence of a pitch accent, intensity plays a stronger role. Our data also show that clustering based on stress correlates is not consistent with the predetermined monolingual-heritage-L2 group division, which indicates that language learning experience alone is not sufficient to explain how Spanish speakers mark stress. This study highlights the importance of incorporating variable speech data in speech perception research and adopting a data-driven, individual-centered approach to speaker grouping in cross-sectional studies.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"111 ","pages":"Article 101418"},"PeriodicalIF":1.9,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144115751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-15DOI: 10.1016/j.wocn.2025.101416
Yaqian Huang
Period doubling, an under-studied yet frequently-occurring subtype of creaky voice, has distinct acoustic and phonatory properties compared to vocal fry, the most-studied and known subtype of creaky voice. Little is known about their distributional patterns across tones or utterances, let alone their potentially different functions in informing linguistic meaning and categories. In this paper, I investigate the tonal and phrasal influences on the distribution of these two voicing types as they occur sub-phonemically in Mandarin Chinese. The results show that both creak subtypes occur most frequently in Tones 3 and 2, and period doubling is more widespread across tones than vocal fry. Period doubling occurs most frequently at utterance edges, with its frequency gradually increasing toward the end of utterances, possibly reflecting vocal instability. Vocal fry, in contrast, is concentrated in the post- and pre-focal positions conditioned by the sentence-medial stimuli and in utterance-final positions, suggesting a stronger linguistic role in marking weak prosodic elements. This study also discusses implications for speech production and linguistic functions of different kinds of creak.
{"title":"The role of tone and phrasing in the occurrence of period doubling and vocal fry in Mandarin","authors":"Yaqian Huang","doi":"10.1016/j.wocn.2025.101416","DOIUrl":"10.1016/j.wocn.2025.101416","url":null,"abstract":"<div><div>Period doubling, an under-studied yet frequently-occurring subtype of creaky voice, has distinct acoustic and phonatory properties compared to vocal fry, the most-studied and known subtype of creaky voice. Little is known about their distributional patterns across tones or utterances, let alone their potentially different functions in informing linguistic meaning and categories. In this paper, I investigate the tonal and phrasal influences on the distribution of these two voicing types as they occur sub-phonemically in Mandarin Chinese. The results show that both creak subtypes occur most frequently in Tones 3 and 2, and period doubling is more widespread across tones than vocal fry. Period doubling occurs most frequently at utterance edges, with its frequency gradually increasing toward the end of utterances, possibly reflecting vocal instability. Vocal fry, in contrast, is concentrated in the post- and pre-focal positions conditioned by the sentence-medial stimuli and in utterance-final positions, suggesting a stronger linguistic role in marking weak prosodic elements. This study also discusses implications for speech production and linguistic functions of different kinds of creak.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"111 ","pages":"Article 101416"},"PeriodicalIF":1.9,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}