Pub Date : 2024-09-12DOI: 10.1016/j.wocn.2024.101352
Patrice Speeter Beddor , Andries W. Coetzee , Ian Calloway , Stephen Tobin , Ruaridh Purse
When language users accommodate a novel phonetic variant, they adjust their perceptual and articulatory spaces in listener- and speaker-specific ways. Motivated by the centrality of accommodation and the perception-production relation to theories of phonetics and sound change, this study tests the hypothesis that individuals who are adept at perceptually retuning for a novel variant will be more accurate imitators of that form. In perceptual eye-tracking and spontaneous imitation ultrasound-imaging tasks, 37 American English participants were exposed to a talker’s novel raised /æ/ before /ɡ/ (bag), and to their familiar unraised /æk/ (back) and /eɪk/ (bake). Consistent with the hypothesis, results showed that the more participants showed perceptual facilitation (i.e., used raised /æ(ɡ)/ to disambiguate back-bag trials), the more they imitated raised /æ(ɡ)/. Perceptual retuning, though, did not predict articulatory restructuring: imitators produced not context-dependent raising, but more general “imitative” raising. For theories of sound change, the findings provide circumscribed support for especially adept perceptual adapters to an innovation having the potential to be strong disseminators of that variant. For theories of accommodation, findings point toward the importance of studying imitation of a targeted variant in the broader context of how talkers and imitators situate that variant in relation to phonetically similar forms.
{"title":"The relation between perceptual retuning and articulatory restructuring: Individual differences in accommodating a novel phonetic variant","authors":"Patrice Speeter Beddor , Andries W. Coetzee , Ian Calloway , Stephen Tobin , Ruaridh Purse","doi":"10.1016/j.wocn.2024.101352","DOIUrl":"10.1016/j.wocn.2024.101352","url":null,"abstract":"<div><p>When language users accommodate a novel phonetic variant, they adjust their perceptual and articulatory spaces in listener- and speaker-specific ways. Motivated by the centrality of accommodation and the perception-production relation to theories of phonetics and sound change, this study tests the hypothesis that individuals who are adept at perceptually retuning for a novel variant will be more accurate imitators of that form. In perceptual eye-tracking and spontaneous imitation ultrasound-imaging tasks, 37 American English participants were exposed to a talker’s novel raised /æ/ before /ɡ/ (<em>bag</em>), and to their familiar unraised /æk/ (<em>back</em>) and /eɪk/ (<em>bake</em>). Consistent with the hypothesis, results showed that the more participants showed perceptual facilitation (i.e., used raised /æ(ɡ)/ to disambiguate <em>back-bag</em> trials), the more they imitated raised /æ(ɡ)/. Perceptual retuning, though, did not predict articulatory restructuring: imitators produced not context-dependent raising, but more general “imitative” raising. For theories of sound change, the findings provide circumscribed support for especially adept perceptual adapters to an innovation having the potential to be strong disseminators of that variant. For theories of accommodation, findings point toward the importance of studying imitation of a targeted variant in the broader context of how talkers and imitators situate that variant in relation to phonetically similar forms.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"107 ","pages":"Article 101352"},"PeriodicalIF":1.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000585/pdfft?md5=897dce2fd42f59ca368b5e7dc02d21a1&pid=1-s2.0-S0095447024000585-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1016/j.wocn.2024.101353
Richard Hatcher , Hyunjung Joo , Sahyang Kim , Taehong Cho
This study investigates the phonetic realization of contrastive focus in short utterances in Seoul Korean, a so-called 'edge-prominence' language, which is assumed to express focus-induced prominence primarily through phrasing. The study explores how the distribution of phrase-level tones and their realization is influenced by focus in different positions of target words with different coda segmental makeups (/pam, pap/). Phrase-initially, focus displays a typical phrase-initial f0 rise for the L and H tones, with the L tone anchored to the focused monosyllabic word and the H tone to the following syllable, accompanied by a tonal expansion. This expansion results from an elevated f0 peak for the H while the L remains unchanged, showing tonal hyperarticulation only in the H tone. Phrase-medially, a similar f0 rise occurs under focus, but without robust tonal expansion. Crucially, the f0 rise is not accompanied by clear temporal or tonal evidence for the creation of a new phrase, demonstrating focus realization without phrasing. Phrase-finally, focus also shows no phrasing evidence. It results in an f0 fall, possibly due to tonal crowding of the L and H tones with the upcoming low boundary tone. However, this fall is distinct from a similar fall under no focus, suggesting a phonetic trace of the focal rise. Both initially and medially, the tonal realization of the f0 rise is affected by the segmental makeup (/pap/ vs. /pam/) only at the microprosodic level while maintaining the tonal targets, even in the face of physically adverse conditions for an f0 rise through the voiceless gap. The findings of the present study illuminate the intricate phonetic details of focus realization with a f0 rise in a language other than the well-studied West Germanic and Romance languages which employ word-level stress. The findings also shed new light on the relationship between focus and prosodic phrasing, implying that focus, previously argued to drive prosodic phrasing in Seoul Korean, is just one of several potentially competing structures that determine a sentence’s phrasing, thereby underscoring the multidimensional nature of prosodic structure.
首尔韩语是一种所谓的 "边缘突出 "语言,主要通过短语来表达由重点引起的突出。本研究探讨了在具有不同尾音段构成(/pam、pap/)的目标词的不同位置上,短语级声调的分布及其实现如何受到焦点的影响。在短语初始阶段,聚焦显示出 L 和 H 音的典型短语初始 f0 上升,L 音固定在聚焦的单音节词上,H 音固定在接下来的音节上,同时伴随着音调扩展。这种扩展的结果是 H 音的 f0 峰值升高,而 L 音保持不变,仅在 H 音中表现出音调的过度发音。在句子中间,聚焦时也会出现类似的 f0 上升,但没有强烈的音调扩展。最重要的是,f0 的上升并没有伴随新短语产生的明确的时间或音调证据,这表明在没有短语的情况下实现了聚焦。句末聚焦也没有显示出短语证据。它导致 f0 下降,可能是由于 L 和 H 音的音调被即将到来的低边界音所挤占。然而,这种下降与无聚焦时的类似下降不同,表明聚焦上升有语音痕迹。无论是在初始阶段还是在中间阶段,f0 上升的音调实现都只在微节奏水平上受到音段构成(/pap/ 与 /pam/)的影响,同时保持音调目标,即使在物理条件不利的情况下,f0 上升也能通过无声间隙实现。本研究的结果阐明了在西日耳曼语和罗曼语以外的其他语言中通过 f0 上升实现重心的复杂语音细节。研究结果还揭示了重点和拟声组词之间的关系,这意味着以前被认为是汉城韩语拟声组词驱动力的重点只是决定句子组词的几种潜在竞争结构之一,从而强调了拟声结构的多维性。
{"title":"Focus-induced tonal distribution in Seoul Korean as an edge-prominence language","authors":"Richard Hatcher , Hyunjung Joo , Sahyang Kim , Taehong Cho","doi":"10.1016/j.wocn.2024.101353","DOIUrl":"10.1016/j.wocn.2024.101353","url":null,"abstract":"<div><p>This study investigates the phonetic realization of contrastive focus in short utterances in Seoul Korean, a so-called 'edge-prominence' language, which is assumed to express focus-induced prominence primarily through phrasing. The study explores how the distribution of phrase-level tones and their realization is influenced by focus in different positions of target words with different coda segmental makeups (/pam, pap/). Phrase-initially, focus displays a typical phrase-initial f0 rise for the L and H tones, with the L tone anchored to the focused monosyllabic word and the H tone to the following syllable, accompanied by a tonal expansion. This expansion results from an elevated f0 peak for the H while the L remains unchanged, showing tonal hyperarticulation only in the H tone. Phrase-medially, a similar f0 rise occurs under focus, but without robust tonal expansion. Crucially, the f0 rise is not accompanied by clear temporal or tonal evidence for the creation of a new phrase, demonstrating focus realization without phrasing. Phrase-finally, focus also shows no phrasing evidence. It results in an f0 fall, possibly due to tonal crowding of the L and H tones with the upcoming low boundary tone. However, this fall is distinct from a similar fall under no focus, suggesting a phonetic trace of the focal rise. Both initially and medially, the tonal realization of the f0 rise is affected by the segmental makeup (/pap/ vs. /pam/) only at the microprosodic level while maintaining the tonal targets, even in the face of physically adverse conditions for an f0 rise through the voiceless gap. The findings of the present study illuminate the intricate phonetic details of focus realization with a f0 rise in a language other than the well-studied West Germanic and Romance languages which employ word-level stress. The findings also shed new light on the relationship between focus and prosodic phrasing, implying that focus, previously argued to drive prosodic phrasing in Seoul Korean, is just one of several potentially competing structures that determine a sentence’s phrasing, thereby underscoring the multidimensional nature of prosodic structure.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"107 ","pages":"Article 101353"},"PeriodicalIF":1.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1016/j.wocn.2024.101355
Jennifer Kuo
Paradigms with conflicting data patterns can be difficult to learn, resulting in a type of language change called reanalysis. Existing models of morphophonology predict reanalysis to occur in a way that matches frequency distributions within the paradigm. Using evidence from Samoan, this paper argues that in addition, reanalysis may be constrained by phonotactics (global distributional regularities in the lexicon) and phonetic substance. More concretely, I find that reanalysis of Samoan thematic consonants generally matches distributional patterns within the paradigm. However, reanalysis is also modulated by a phonotactic dispreference against sequences of homorganic consonants, analyzed here in Optimality Theoretic terms by OCP-place. These results are supported by an iterated learning model that is based in MaxEnt (Goldwater and Johnson, 2003). In a study where phonetic similarity is measured as the spectral distance between two phones, I find that similarity of consonants is closely correlated with the strength of OCP-place effects in Samoan; this suggests that OCP-place is rooted in phonetic similarity avoidance, and more generally that in reanalysis, speakers preferentially utilize phonetically-motivated phonotactics.
{"title":"Phonetic naturalness in the reanalysis of Samoan thematic consonant alternations","authors":"Jennifer Kuo","doi":"10.1016/j.wocn.2024.101355","DOIUrl":"10.1016/j.wocn.2024.101355","url":null,"abstract":"<div><p>Paradigms with conflicting data patterns can be difficult to learn, resulting in a type of language change called <em>reanalysis</em>. Existing models of morphophonology predict reanalysis to occur in a way that matches frequency distributions within the paradigm. Using evidence from Samoan, this paper argues that in addition, reanalysis may be constrained by phonotactics (global distributional regularities in the lexicon) and phonetic substance. More concretely, I find that reanalysis of Samoan thematic consonants generally matches distributional patterns within the paradigm. However, reanalysis is also modulated by a phonotactic dispreference against sequences of homorganic consonants, analyzed here in Optimality Theoretic terms by OCP-place. These results are supported by an iterated learning model that is based in MaxEnt (<span><span>Goldwater and Johnson, 2003</span></span>). In a study where phonetic similarity is measured as the spectral distance between two phones, I find that similarity of consonants is closely correlated with the strength of OCP-place effects in Samoan; this suggests that OCP-place is rooted in phonetic similarity avoidance, and more generally that in reanalysis, speakers preferentially utilize phonetically-motivated phonotactics.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"107 ","pages":"Article 101355"},"PeriodicalIF":1.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000615/pdfft?md5=aa558be6942255913e22b9f211f9e259&pid=1-s2.0-S0095447024000615-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.wocn.2024.101354
Rasmus Puggaard-Rode
This paper provides evidence for the assumption that the precise phonetic implementation of laryngeal contrast in obstruents can have an influence on higher order linguistic structure. Traditional varieties of Jutland Danish – which are all broadly ‘aspirating’ varieties – are used as a case study. The paper shows that the precise implementation of the aspirated–unaspirated contrast in stops varied systematically in these varieties, and that this covaries with the morphophonological process of stop gradation. Stop gradation is a lenition process which is historically found in the entire Danish-speaking area, but with quite varying outcomes, which were mapped extensively by dialectologists more than a century ago. Using a large legacy corpus of sociolinguistic interviews from the 1970s, this study shows that more sonorous outcomes of stop gradation covary with higher rates of continuous closure voicing in /b d g/ and shorter aspiration in /p t k/, and vice versa for less sonorous outcomes of stop gradation.
本文为以下假设提供了证据,即喉音对比在塞音中的精确发音会对高阶语言结构产生影响。本文以日德兰丹麦语的传统变体(它们都是广义上的 "吸气 "变体)为例进行研究。论文表明,在这些变体中,停顿中吸气与不吸气对比的精确实施有系统地变化,这与停顿分级的形态学过程有关。停顿分级是整个丹麦语区历史上都存在的一种宽化过程,但其结果却千差万别,方言学家早在一个多世纪前就对其进行了广泛的研究。本研究利用 20 世纪 70 年代遗留下来的大型社会语言学访谈语料库,表明停顿分级的音调较高的结果与 /b d g/ 中较高的连续闭合发声率和 /p t k/ 中较短的吸气率共存,反之则与停顿分级的音调较低的结果共存。
{"title":"Variation in fine phonetic detail can modulate the outcome of sound change: The case of stop gradation and laryngeal contrast implementation in Jutland Danish","authors":"Rasmus Puggaard-Rode","doi":"10.1016/j.wocn.2024.101354","DOIUrl":"10.1016/j.wocn.2024.101354","url":null,"abstract":"<div><p>This paper provides evidence for the assumption that the precise phonetic implementation of laryngeal contrast in obstruents can have an influence on higher order linguistic structure. Traditional varieties of Jutland Danish – which are all broadly ‘aspirating’ varieties – are used as a case study. The paper shows that the precise implementation of the aspirated–unaspirated contrast in stops varied systematically in these varieties, and that this covaries with the morphophonological process of stop gradation. Stop gradation is a lenition process which is historically found in the entire Danish-speaking area, but with quite varying outcomes, which were mapped extensively by dialectologists more than a century ago. Using a large legacy corpus of sociolinguistic interviews from the 1970s, this study shows that more sonorous outcomes of stop gradation covary with higher rates of continuous closure voicing in /b d g/ and shorter aspiration in /p t k/, and <em>vice versa</em> for less sonorous outcomes of stop gradation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101354"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000603/pdfft?md5=859e34aeb56cd3078cc452afcc961edc&pid=1-s2.0-S0095447024000603-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-03DOI: 10.1016/j.wocn.2024.101351
Carla Wikse Barrow , Sofia Strömbergsson , Marcin Włodarczak , Mattias Heldner
In this study, we explore individual variation and contrast in Swedish children’s voiceless fricatives. Thirty-one children between three and eight years of age participated in a picture-prompted word repetition task, wherein they repeated fricative-initial words in a variety of vowel contexts. The fricatives were transcribed and acoustically analysed, using spectral moments 1–4, spectral peak and spectral balance measures. Random forests were used to estimate the relative importance of each spectral feature in the classification of correct fricative productions, as well as to measure robustness of the late-emerging contrast between sibilants [s] and [ɕ] in individual children. Transcription analysis revealed that substitutions involving a more anterior place of articulation were common. Acoustic analysis showed individual differences in variability and contrast in the children’s fricative systems across and within age groups. Cue weighting of spectral characteristics in classification was similar in all age groups for correct productions, while the magnitude of the acoustic contrast between sibilants increased with age. This paper provides a description of individual variation in Swedish children’s acquisition of fricatives which can inform future large-scale speech-acquisition research.
{"title":"Individual variation in the realisation and contrast of Swedish children’s word-initial voiceless fricatives","authors":"Carla Wikse Barrow , Sofia Strömbergsson , Marcin Włodarczak , Mattias Heldner","doi":"10.1016/j.wocn.2024.101351","DOIUrl":"10.1016/j.wocn.2024.101351","url":null,"abstract":"<div><p>In this study, we explore individual variation and contrast in Swedish children’s voiceless fricatives. Thirty-one children between three and eight years of age participated in a picture-prompted word repetition task, wherein they repeated fricative-initial words in a variety of vowel contexts. The fricatives were transcribed and acoustically analysed, using spectral moments 1–4, spectral peak and spectral balance measures. Random forests were used to estimate the relative importance of each spectral feature in the classification of correct fricative productions, as well as to measure robustness of the late-emerging contrast between sibilants [s] and [ɕ] in individual children. Transcription analysis revealed that substitutions involving a more anterior place of articulation were common. Acoustic analysis showed individual differences in variability and contrast in the children’s fricative systems across and within age groups. Cue weighting of spectral characteristics in classification was similar in all age groups for correct productions, while the magnitude of the acoustic contrast between sibilants increased with age. This paper provides a description of individual variation in Swedish children’s acquisition of fricatives which can inform future large-scale speech-acquisition research.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101351"},"PeriodicalIF":1.9,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000573/pdfft?md5=0452c2848f4fdcda7f79fcb14232f16d&pid=1-s2.0-S0095447024000573-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141962169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1016/j.wocn.2024.101341
Chenzi Xu
With an aim to investigate the nature of Mandarin neutral tone through the lens of language variation and change, this study examines the pitch patterns of speech sequences containing neutral tone syllables, i.e. those that do not have any of the four canonical lexical tones and are often overlooked in prior studies of tones, in two Mandarin varieties: Standard Mandarin and Plastic Mandarin spoken in Changsha, China. Using Generalised Additive Mixed Models, the study shows (a) that f0 contours of a sequence of neutral tone syllables following various lexical tones converge in the end at a low pitch in both Mandarin varieties, and (b) that the low pitch target of neutral tone syllables tends to be the same across the two Mandarin varieties. The cross-dialectal comparison favours the phonological account that neutral tone is underlyingly underspecified and attracts the boundary tone. It suggests that the constant pitch target across two Mandarin varieties with distinct lexical tone contours may be attributed to the stable transfer of prosodic structure in the Standard-Plastic variation.
{"title":"Cross-dialectal perspectives on Mandarin neutral tone","authors":"Chenzi Xu","doi":"10.1016/j.wocn.2024.101341","DOIUrl":"10.1016/j.wocn.2024.101341","url":null,"abstract":"<div><p>With an aim to investigate the nature of Mandarin neutral tone through the lens of language variation and change, this study examines the pitch patterns of speech sequences containing neutral tone syllables, i.e. those that do not have any of the four canonical lexical tones and are often overlooked in prior studies of tones, in two Mandarin varieties: Standard Mandarin and Plastic Mandarin spoken in Changsha, China. Using Generalised Additive Mixed Models, the study shows (a) that f0 contours of a sequence of neutral tone syllables following various lexical tones converge in the end at a low pitch in both Mandarin varieties, and (b) that the low pitch target of neutral tone syllables tends to be the same across the two Mandarin varieties. The cross-dialectal comparison favours the phonological account that neutral tone is underlyingly underspecified and attracts the boundary tone. It suggests that the constant pitch target across two Mandarin varieties with distinct lexical tone contours may be attributed to the stable transfer of prosodic structure in the Standard-Plastic variation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101341"},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000470/pdfft?md5=df830596572034862bec620d217c23e8&pid=1-s2.0-S0095447024000470-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1016/j.wocn.2024.101342
Alvin Cheng-Hsien Chen
This study explores pitch variability in language production and its implication for processing advantages of holistic units, with a specific focus on the relationship between disyllabic word production and their distributional properties in language use. Using a 185-million-word native corpus as a proxy for the statistical properties of native usage, the study examines how pitch variability of disyllabic words in a spontaneous speech corpus of Taiwan Mandarin is influenced by lexical frequency, predictive contingencies, and retrodictive contingencies. Building upon the duration-based pairwise variability index (PVI), this study introduces two variants of pitch-related PVI (f0PVI) to quantify pitch variability within speech segments. We assess their effectiveness through three phonetic analyses. The first analysis shows that disyllabic words exhibit significantly lower f0PVI values than their non-holistic part-word counterparts, indicating the metric’s capability to distinguish holistic linguistic units. The second analysis uncovers a significant inverse correlation between the pitch variability metrics of disyllabic words and their frequency values, highlighting a strong link between reduced prosodic prominence and the frequency-based processing advantages in lexical production. Finally, the third analysis demonstrates moderated effects of retrodictive lexical contingency on pitch variability, contingent on the word’s alignment with prosodic junctures. We discuss the implications of contextual predictability in lexical retrieval and its role in the dynamic planning process of speech production. Our findings underscore f0PVI as a robust prosodic measure for the automatized processing and entrenchment of linguistic units arising from repeated usage.
{"title":"Pitch variability in spontaneous speech production and its connection to usage-based grammar","authors":"Alvin Cheng-Hsien Chen","doi":"10.1016/j.wocn.2024.101342","DOIUrl":"10.1016/j.wocn.2024.101342","url":null,"abstract":"<div><p>This study explores pitch variability in language production and its implication for processing advantages of holistic units, with a specific focus on the relationship between disyllabic word production and their distributional properties in language use. Using a 185-million-word native corpus as a proxy for the statistical properties of native usage, the study examines how pitch variability of disyllabic words in a spontaneous speech corpus of Taiwan Mandarin is influenced by lexical frequency, predictive contingencies, and retrodictive contingencies. Building upon the duration-based pairwise variability index (PVI), this study introduces two variants of pitch-related PVI (f0PVI) to quantify pitch variability within speech segments. We assess their effectiveness through three phonetic analyses. The first analysis shows that disyllabic words exhibit significantly lower f0PVI values than their non-holistic part-word counterparts, indicating the metric’s capability to distinguish holistic linguistic units. The second analysis uncovers a significant inverse correlation between the pitch variability metrics of disyllabic words and their frequency values, highlighting a strong link between reduced prosodic prominence and the frequency-based processing advantages in lexical production. Finally, the third analysis demonstrates moderated effects of retrodictive lexical contingency on pitch variability, contingent on the word’s alignment with prosodic junctures. We discuss the implications of contextual predictability in lexical retrieval and its role in the dynamic planning process of speech production. Our findings underscore f0PVI as a robust prosodic measure for the automatized processing and entrenchment of linguistic units arising from repeated usage.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101342"},"PeriodicalIF":1.9,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141638626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.wocn.2024.101340
Jill C. Thorson, Rachel Steindel Burdin
This study explores downstepping in Mainstream US English using three experiments. Experiment 1 investigated if downstep was associated with accessible referents. Pairs of scenarios were constructed: one with new information and one with accessible. Two versions of the target utterances were recorded (one with high star, and one with downstepping) and presented in the accessible and new contexts. The high star contour was preferred overall, but less so in accessible contexts. A statistical model showed an effect of the phonetic implementation of the contour. Experiment 2 examined the phonetic realizations of the utterances in Experiment 1 using a categorical perception discrimination task. Participants showed linear perception within the downstep contours but a categorical difference between the high star and downstep contours. Experiment 3 explored the interpretations attached to downstepping. Listeners showed a categorical difference between high star and downstep contours for interpretation, hearing downstep as indicating something had happened before, and more resigned, disappointed, and less clear than high star contours. There was also variation within the downstep contours based on phonetic implementation of the contour. We show that downstep contours have distinct meanings from high star contours, and that these meanings may be mediated by their phonetic implementation.
{"title":"Phonetic implementation and the interpretation of downstepping in Mainstream US English","authors":"Jill C. Thorson, Rachel Steindel Burdin","doi":"10.1016/j.wocn.2024.101340","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101340","url":null,"abstract":"<div><p>This study explores downstepping in Mainstream US English using three experiments. Experiment 1 investigated if downstep was associated with accessible referents. Pairs of scenarios were constructed: one with <em>new</em> information and one with <em>accessible</em>. Two versions of the target utterances were recorded (one with high star, and one with downstepping) and presented in the <em>accessible</em> and <em>new</em> contexts. The high star contour was preferred overall, but less so in <em>accessible</em> contexts. A statistical model showed an effect of the phonetic implementation of the contour. Experiment 2 examined the phonetic realizations of the utterances in Experiment 1 using a categorical perception discrimination task. Participants showed linear perception within the downstep contours but a categorical difference between the high star and downstep contours. Experiment 3 explored the interpretations attached to downstepping. Listeners showed a categorical difference between high star and downstep contours for interpretation, hearing downstep as indicating something had happened before, and more resigned, disappointed, and less clear than high star contours. There was also variation within the downstep contours based on phonetic implementation of the contour. We show that downstep contours have distinct meanings from high star contours, and that these meanings may be mediated by their phonetic implementation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101340"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.wocn.2024.101338
Anqi Xu , Daniel R. van Niekerk , Branislav Gerazov , Paul Konstantin Krug , Peter Birkholz , Santitham Prom-on , Lorna F. Halliday , Yi Xu
It has long been a mystery how children learn to speak without formal instructions. Previous research has used computational modelling to help solve the mystery by simulating vocal learning with direct imitation or caregiver feedback, but has encountered difficulty in overcoming the speaker normalisation problem, namely, discrepancies between children’s vocalisations and that of adults due to age-related anatomical differences. Here we show that vocal learning can be successfully simulated via recognition-guided vocal exploration without explicit speaker normalisation. We trained an articulatory synthesiser with three-dimensional vocal tract models of an adult and two child configurations of different ages to learn monosyllabic English words consisting of CVC syllables, based on coarticulatory dynamics and two kinds of auditory feedback: (i) acoustic features to simulate universal phonetic perception (or direct imitation), and (ii) a deep-learning-based speech recogniser to simulate native-language phonological perception. Native listeners were invited to evaluate the learned synthetic speech with natural speech as baseline reference. Results show that the English words trained with the speech recogniser were more intelligible than those trained with acoustic features, sometimes close to natural speech. The successful simulation of vocal learning in this study suggests that a combination of coarticulatory dynamics and native-language phonological perception may be critical also for real-life vocal production learning.
{"title":"Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak","authors":"Anqi Xu , Daniel R. van Niekerk , Branislav Gerazov , Paul Konstantin Krug , Peter Birkholz , Santitham Prom-on , Lorna F. Halliday , Yi Xu","doi":"10.1016/j.wocn.2024.101338","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101338","url":null,"abstract":"<div><p>It has long been a mystery how children learn to speak without formal instructions. Previous research has used computational modelling to help solve the mystery by simulating vocal learning with direct imitation or caregiver feedback, but has encountered difficulty in overcoming the speaker normalisation problem, namely, discrepancies between children’s vocalisations and that of adults due to age-related anatomical differences. Here we show that vocal learning can be successfully simulated via recognition-guided vocal exploration without explicit speaker normalisation. We trained an articulatory synthesiser with three-dimensional vocal tract models of an adult and two child configurations of different ages to learn monosyllabic English words consisting of CVC syllables, based on coarticulatory dynamics and two kinds of auditory feedback: (i) acoustic features to simulate universal phonetic perception (or direct imitation), and (ii) a deep-learning-based speech recogniser to simulate native-language phonological perception. Native listeners were invited to evaluate the learned synthetic speech with natural speech as baseline reference. Results show that the English words trained with the speech recogniser were more intelligible than those trained with acoustic features, sometimes close to natural speech. The successful simulation of vocal learning in this study suggests that a combination of coarticulatory dynamics and native-language phonological perception may be critical also for real-life vocal production learning.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101338"},"PeriodicalIF":1.9,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000445/pdfft?md5=941cb45273d2db483f6143ef8085a741&pid=1-s2.0-S0095447024000445-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141428706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.1016/j.wocn.2024.101339
Wei-Rong Chen , Michael C. Stern , D.H. Whalen , Donald Derrick , Christopher Carignan , Catherine T. Best , Mark Tiede
Ultrasound imaging of the tongue is biased by the probe movements relative to the speaker’s head. Two common remedies are restricting or algorithmically compensating for such movements, each with its own challenges. We describe these challenges in details and evaluate an open-source, adjustable probe stabilizer for ultrasound (ALPHUS), specifically designed to address these challenges by restricting uncorrectable probe movements while allowing for correctable ones (e.g., jaw opening) to facilitate naturalness. The stabilizer is highly modular and adaptable to different users (e.g., adults and children) and different research/clinical needs (e.g., imaging in both midsagittal and coronal orientations). The results of three experiments show that probe movement over uncorrectable degrees of freedom was negligible, while movement over correctable degrees of freedom that could be compensated through post-processing alignment was relatively large, indicating unconstrained articulation over parameters relevant for natural speech. Results also showed that probe movements as small as 5 mm or 2 degrees can neutralize phonemic contrasts in ultrasound tongue positions. This demonstrates that while stabilized but uncorrected ultrasound imaging can provide reliable tongue shape information (e.g., curvature or complexity), accurate tongue position (e.g., height or backness) with respect to vocal tract hard structure needs correction for probe displacement relative to the head.
{"title":"Assessing ultrasound probe stabilization for quantifying speech production contrasts using the Adjustable Laboratory Probe Holder for UltraSound (ALPHUS)","authors":"Wei-Rong Chen , Michael C. Stern , D.H. Whalen , Donald Derrick , Christopher Carignan , Catherine T. Best , Mark Tiede","doi":"10.1016/j.wocn.2024.101339","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101339","url":null,"abstract":"<div><p>Ultrasound imaging of the tongue is biased by the probe movements relative to the speaker’s head. Two common remedies are restricting or algorithmically compensating for such movements, each with its own challenges. We describe these challenges in details and evaluate an open-source, adjustable probe stabilizer for ultrasound (ALPHUS), specifically designed to address these challenges by restricting uncorrectable probe movements while allowing for correctable ones (e.g., jaw opening) to facilitate naturalness. The stabilizer is highly modular and adaptable to different users (e.g., adults and children) and different research/clinical needs (e.g., imaging in both midsagittal and coronal orientations). The results of three experiments show that probe movement over uncorrectable degrees of freedom was negligible, while movement over correctable degrees of freedom that could be compensated through post-processing alignment was relatively large, indicating unconstrained articulation over parameters relevant for natural speech. Results also showed that probe movements as small as 5 mm or 2 degrees can neutralize phonemic contrasts in ultrasound tongue positions. This demonstrates that while stabilized but uncorrected ultrasound imaging can provide reliable tongue shape information (e.g., curvature or complexity), accurate tongue position (e.g., height or backness) with respect to vocal tract hard structure needs correction for probe displacement relative to the head.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101339"},"PeriodicalIF":1.9,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}