Pub Date : 2024-07-22DOI: 10.1016/j.wocn.2024.101341
Chenzi Xu
With an aim to investigate the nature of Mandarin neutral tone through the lens of language variation and change, this study examines the pitch patterns of speech sequences containing neutral tone syllables, i.e. those that do not have any of the four canonical lexical tones and are often overlooked in prior studies of tones, in two Mandarin varieties: Standard Mandarin and Plastic Mandarin spoken in Changsha, China. Using Generalised Additive Mixed Models, the study shows (a) that f0 contours of a sequence of neutral tone syllables following various lexical tones converge in the end at a low pitch in both Mandarin varieties, and (b) that the low pitch target of neutral tone syllables tends to be the same across the two Mandarin varieties. The cross-dialectal comparison favours the phonological account that neutral tone is underlyingly underspecified and attracts the boundary tone. It suggests that the constant pitch target across two Mandarin varieties with distinct lexical tone contours may be attributed to the stable transfer of prosodic structure in the Standard-Plastic variation.
{"title":"Cross-dialectal perspectives on Mandarin neutral tone","authors":"Chenzi Xu","doi":"10.1016/j.wocn.2024.101341","DOIUrl":"10.1016/j.wocn.2024.101341","url":null,"abstract":"<div><p>With an aim to investigate the nature of Mandarin neutral tone through the lens of language variation and change, this study examines the pitch patterns of speech sequences containing neutral tone syllables, i.e. those that do not have any of the four canonical lexical tones and are often overlooked in prior studies of tones, in two Mandarin varieties: Standard Mandarin and Plastic Mandarin spoken in Changsha, China. Using Generalised Additive Mixed Models, the study shows (a) that f0 contours of a sequence of neutral tone syllables following various lexical tones converge in the end at a low pitch in both Mandarin varieties, and (b) that the low pitch target of neutral tone syllables tends to be the same across the two Mandarin varieties. The cross-dialectal comparison favours the phonological account that neutral tone is underlyingly underspecified and attracts the boundary tone. It suggests that the constant pitch target across two Mandarin varieties with distinct lexical tone contours may be attributed to the stable transfer of prosodic structure in the Standard-Plastic variation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101341"},"PeriodicalIF":1.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000470/pdfft?md5=df830596572034862bec620d217c23e8&pid=1-s2.0-S0095447024000470-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1016/j.wocn.2024.101342
Alvin Cheng-Hsien Chen
This study explores pitch variability in language production and its implication for processing advantages of holistic units, with a specific focus on the relationship between disyllabic word production and their distributional properties in language use. Using a 185-million-word native corpus as a proxy for the statistical properties of native usage, the study examines how pitch variability of disyllabic words in a spontaneous speech corpus of Taiwan Mandarin is influenced by lexical frequency, predictive contingencies, and retrodictive contingencies. Building upon the duration-based pairwise variability index (PVI), this study introduces two variants of pitch-related PVI (f0PVI) to quantify pitch variability within speech segments. We assess their effectiveness through three phonetic analyses. The first analysis shows that disyllabic words exhibit significantly lower f0PVI values than their non-holistic part-word counterparts, indicating the metric’s capability to distinguish holistic linguistic units. The second analysis uncovers a significant inverse correlation between the pitch variability metrics of disyllabic words and their frequency values, highlighting a strong link between reduced prosodic prominence and the frequency-based processing advantages in lexical production. Finally, the third analysis demonstrates moderated effects of retrodictive lexical contingency on pitch variability, contingent on the word’s alignment with prosodic junctures. We discuss the implications of contextual predictability in lexical retrieval and its role in the dynamic planning process of speech production. Our findings underscore f0PVI as a robust prosodic measure for the automatized processing and entrenchment of linguistic units arising from repeated usage.
{"title":"Pitch variability in spontaneous speech production and its connection to usage-based grammar","authors":"Alvin Cheng-Hsien Chen","doi":"10.1016/j.wocn.2024.101342","DOIUrl":"10.1016/j.wocn.2024.101342","url":null,"abstract":"<div><p>This study explores pitch variability in language production and its implication for processing advantages of holistic units, with a specific focus on the relationship between disyllabic word production and their distributional properties in language use. Using a 185-million-word native corpus as a proxy for the statistical properties of native usage, the study examines how pitch variability of disyllabic words in a spontaneous speech corpus of Taiwan Mandarin is influenced by lexical frequency, predictive contingencies, and retrodictive contingencies. Building upon the duration-based pairwise variability index (PVI), this study introduces two variants of pitch-related PVI (f0PVI) to quantify pitch variability within speech segments. We assess their effectiveness through three phonetic analyses. The first analysis shows that disyllabic words exhibit significantly lower f0PVI values than their non-holistic part-word counterparts, indicating the metric’s capability to distinguish holistic linguistic units. The second analysis uncovers a significant inverse correlation between the pitch variability metrics of disyllabic words and their frequency values, highlighting a strong link between reduced prosodic prominence and the frequency-based processing advantages in lexical production. Finally, the third analysis demonstrates moderated effects of retrodictive lexical contingency on pitch variability, contingent on the word’s alignment with prosodic junctures. We discuss the implications of contextual predictability in lexical retrieval and its role in the dynamic planning process of speech production. Our findings underscore f0PVI as a robust prosodic measure for the automatized processing and entrenchment of linguistic units arising from repeated usage.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"106 ","pages":"Article 101342"},"PeriodicalIF":1.9,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141638626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.wocn.2024.101340
Jill C. Thorson, Rachel Steindel Burdin
This study explores downstepping in Mainstream US English using three experiments. Experiment 1 investigated if downstep was associated with accessible referents. Pairs of scenarios were constructed: one with new information and one with accessible. Two versions of the target utterances were recorded (one with high star, and one with downstepping) and presented in the accessible and new contexts. The high star contour was preferred overall, but less so in accessible contexts. A statistical model showed an effect of the phonetic implementation of the contour. Experiment 2 examined the phonetic realizations of the utterances in Experiment 1 using a categorical perception discrimination task. Participants showed linear perception within the downstep contours but a categorical difference between the high star and downstep contours. Experiment 3 explored the interpretations attached to downstepping. Listeners showed a categorical difference between high star and downstep contours for interpretation, hearing downstep as indicating something had happened before, and more resigned, disappointed, and less clear than high star contours. There was also variation within the downstep contours based on phonetic implementation of the contour. We show that downstep contours have distinct meanings from high star contours, and that these meanings may be mediated by their phonetic implementation.
{"title":"Phonetic implementation and the interpretation of downstepping in Mainstream US English","authors":"Jill C. Thorson, Rachel Steindel Burdin","doi":"10.1016/j.wocn.2024.101340","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101340","url":null,"abstract":"<div><p>This study explores downstepping in Mainstream US English using three experiments. Experiment 1 investigated if downstep was associated with accessible referents. Pairs of scenarios were constructed: one with <em>new</em> information and one with <em>accessible</em>. Two versions of the target utterances were recorded (one with high star, and one with downstepping) and presented in the <em>accessible</em> and <em>new</em> contexts. The high star contour was preferred overall, but less so in <em>accessible</em> contexts. A statistical model showed an effect of the phonetic implementation of the contour. Experiment 2 examined the phonetic realizations of the utterances in Experiment 1 using a categorical perception discrimination task. Participants showed linear perception within the downstep contours but a categorical difference between the high star and downstep contours. Experiment 3 explored the interpretations attached to downstepping. Listeners showed a categorical difference between high star and downstep contours for interpretation, hearing downstep as indicating something had happened before, and more resigned, disappointed, and less clear than high star contours. There was also variation within the downstep contours based on phonetic implementation of the contour. We show that downstep contours have distinct meanings from high star contours, and that these meanings may be mediated by their phonetic implementation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101340"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.wocn.2024.101338
Anqi Xu , Daniel R. van Niekerk , Branislav Gerazov , Paul Konstantin Krug , Peter Birkholz , Santitham Prom-on , Lorna F. Halliday , Yi Xu
It has long been a mystery how children learn to speak without formal instructions. Previous research has used computational modelling to help solve the mystery by simulating vocal learning with direct imitation or caregiver feedback, but has encountered difficulty in overcoming the speaker normalisation problem, namely, discrepancies between children’s vocalisations and that of adults due to age-related anatomical differences. Here we show that vocal learning can be successfully simulated via recognition-guided vocal exploration without explicit speaker normalisation. We trained an articulatory synthesiser with three-dimensional vocal tract models of an adult and two child configurations of different ages to learn monosyllabic English words consisting of CVC syllables, based on coarticulatory dynamics and two kinds of auditory feedback: (i) acoustic features to simulate universal phonetic perception (or direct imitation), and (ii) a deep-learning-based speech recogniser to simulate native-language phonological perception. Native listeners were invited to evaluate the learned synthetic speech with natural speech as baseline reference. Results show that the English words trained with the speech recogniser were more intelligible than those trained with acoustic features, sometimes close to natural speech. The successful simulation of vocal learning in this study suggests that a combination of coarticulatory dynamics and native-language phonological perception may be critical also for real-life vocal production learning.
{"title":"Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak","authors":"Anqi Xu , Daniel R. van Niekerk , Branislav Gerazov , Paul Konstantin Krug , Peter Birkholz , Santitham Prom-on , Lorna F. Halliday , Yi Xu","doi":"10.1016/j.wocn.2024.101338","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101338","url":null,"abstract":"<div><p>It has long been a mystery how children learn to speak without formal instructions. Previous research has used computational modelling to help solve the mystery by simulating vocal learning with direct imitation or caregiver feedback, but has encountered difficulty in overcoming the speaker normalisation problem, namely, discrepancies between children’s vocalisations and that of adults due to age-related anatomical differences. Here we show that vocal learning can be successfully simulated via recognition-guided vocal exploration without explicit speaker normalisation. We trained an articulatory synthesiser with three-dimensional vocal tract models of an adult and two child configurations of different ages to learn monosyllabic English words consisting of CVC syllables, based on coarticulatory dynamics and two kinds of auditory feedback: (i) acoustic features to simulate universal phonetic perception (or direct imitation), and (ii) a deep-learning-based speech recogniser to simulate native-language phonological perception. Native listeners were invited to evaluate the learned synthetic speech with natural speech as baseline reference. Results show that the English words trained with the speech recogniser were more intelligible than those trained with acoustic features, sometimes close to natural speech. The successful simulation of vocal learning in this study suggests that a combination of coarticulatory dynamics and native-language phonological perception may be critical also for real-life vocal production learning.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101338"},"PeriodicalIF":1.9,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000445/pdfft?md5=941cb45273d2db483f6143ef8085a741&pid=1-s2.0-S0095447024000445-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141428706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.1016/j.wocn.2024.101339
Wei-Rong Chen , Michael C. Stern , D.H. Whalen , Donald Derrick , Christopher Carignan , Catherine T. Best , Mark Tiede
Ultrasound imaging of the tongue is biased by the probe movements relative to the speaker’s head. Two common remedies are restricting or algorithmically compensating for such movements, each with its own challenges. We describe these challenges in details and evaluate an open-source, adjustable probe stabilizer for ultrasound (ALPHUS), specifically designed to address these challenges by restricting uncorrectable probe movements while allowing for correctable ones (e.g., jaw opening) to facilitate naturalness. The stabilizer is highly modular and adaptable to different users (e.g., adults and children) and different research/clinical needs (e.g., imaging in both midsagittal and coronal orientations). The results of three experiments show that probe movement over uncorrectable degrees of freedom was negligible, while movement over correctable degrees of freedom that could be compensated through post-processing alignment was relatively large, indicating unconstrained articulation over parameters relevant for natural speech. Results also showed that probe movements as small as 5 mm or 2 degrees can neutralize phonemic contrasts in ultrasound tongue positions. This demonstrates that while stabilized but uncorrected ultrasound imaging can provide reliable tongue shape information (e.g., curvature or complexity), accurate tongue position (e.g., height or backness) with respect to vocal tract hard structure needs correction for probe displacement relative to the head.
{"title":"Assessing ultrasound probe stabilization for quantifying speech production contrasts using the Adjustable Laboratory Probe Holder for UltraSound (ALPHUS)","authors":"Wei-Rong Chen , Michael C. Stern , D.H. Whalen , Donald Derrick , Christopher Carignan , Catherine T. Best , Mark Tiede","doi":"10.1016/j.wocn.2024.101339","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101339","url":null,"abstract":"<div><p>Ultrasound imaging of the tongue is biased by the probe movements relative to the speaker’s head. Two common remedies are restricting or algorithmically compensating for such movements, each with its own challenges. We describe these challenges in details and evaluate an open-source, adjustable probe stabilizer for ultrasound (ALPHUS), specifically designed to address these challenges by restricting uncorrectable probe movements while allowing for correctable ones (e.g., jaw opening) to facilitate naturalness. The stabilizer is highly modular and adaptable to different users (e.g., adults and children) and different research/clinical needs (e.g., imaging in both midsagittal and coronal orientations). The results of three experiments show that probe movement over uncorrectable degrees of freedom was negligible, while movement over correctable degrees of freedom that could be compensated through post-processing alignment was relatively large, indicating unconstrained articulation over parameters relevant for natural speech. Results also showed that probe movements as small as 5 mm or 2 degrees can neutralize phonemic contrasts in ultrasound tongue positions. This demonstrates that while stabilized but uncorrected ultrasound imaging can provide reliable tongue shape information (e.g., curvature or complexity), accurate tongue position (e.g., height or backness) with respect to vocal tract hard structure needs correction for probe displacement relative to the head.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101339"},"PeriodicalIF":1.9,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.1016/j.wocn.2024.101330
Chunyu Ge, Peggy Mok
Suzhou Wu Chinese has undergone a transphonologization of a voicing contrast in initial consonants to a tone contrast. In consequence, the tone system has split into two registers, in which the high register tones are higher in pitch and modal voiced, whilst the low register tones are lower in pitch and breathy voiced. Our previous studies have found that breathy voice in the low register tones is disappearing in younger speakers’ production. This finding motivated us to investigate the effect of breathy voice on tone identification across age groups. Participants from three age groups completed a tone identification experiment. Stimuli were constructed based on natural tokens produced by a middle-aged female speaker and an older female speaker. The manipulation of phonation was accomplished by using the base syllables of both high and low register tones, for both unchecked (T1 vs. T2) and checked (T7 vs. T8) tone pairs. The results showed that breathy voice is still used by younger listeners in their perception and its effect on their tone identification is similar to that for older and middle-aged listeners. Moreover, the effect of breathy voice is modulated by social indexical factors (i.e., talker voice). The implications of the results for the origin of the loss of breathy voice in Suzhou Wu and the mechanism of sound change are discussed.
{"title":"The effect of breathy voice on tone identification by listeners of different ages in Suzhou Wu Chinese","authors":"Chunyu Ge, Peggy Mok","doi":"10.1016/j.wocn.2024.101330","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101330","url":null,"abstract":"<div><p>Suzhou Wu Chinese has undergone a transphonologization of a voicing contrast in initial consonants to a tone contrast. In consequence, the tone system has split into two registers, in which the high register tones are higher in pitch and modal voiced, whilst the low register tones are lower in pitch and breathy voiced. Our previous studies have found that breathy voice in the low register tones is disappearing in younger speakers’ production. This finding motivated us to investigate the effect of breathy voice on tone identification across age groups. Participants from three age groups completed a tone identification experiment. Stimuli were constructed based on natural tokens produced by a middle-aged female speaker and an older female speaker. The manipulation of phonation was accomplished by using the base syllables of both high and low register tones, for both unchecked (T1 vs. T2) and checked (T7 vs. T8) tone pairs. The results showed that breathy voice is still used by younger listeners in their perception and its effect on their tone identification is similar to that for older and middle-aged listeners. Moreover, the effect of breathy voice is modulated by social indexical factors (i.e., talker voice). The implications of the results for the origin of the loss of breathy voice in Suzhou Wu and the mechanism of sound change are discussed.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101330"},"PeriodicalIF":1.9,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140950688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1016/j.wocn.2024.101329
Conceição Cunha , Phil Hoole , Dirk Voit , Jens Frahm , Jonathan Harrington
The diachronic change by which coarticulatory nasalization increases in VN (vowel-nasal) sequences has been modelled as an earlier alignment of the velum combined with oral gesture weakening of N. The model was tested by comparing American (USE) and Standard Southern British English (BRE) based on the assumption that this diachronic change is more advanced in USE. Real-time MRI data was collected from 16 USE and 27 BRE adult speakers producing monosyllables with coda /Vn, Vnd, Vnz/. For USE, nasalization was greater in V, less in N, and there was greater tongue tip lenition than for BRE. The dialects showed a similar stability of the velum gesture and a trade-off between vowel nasalization and tongue tip lenition. Velum alignment was not earlier in USE. Instead, a closer approximation of the time of the tongue tip peak velocity towards the tongue tip maximum for USE caused a shift in the acoustic boundary within VN towards N, giving the illusion that the velum gesture has an earlier alignment in USE. It is suggested that coda reduction which targets the tongue tip more than the velum is a principal physiological mechanism responsible for the onset of diachronic vowel nasalization.
该模型通过比较美式英语(USE)和标准南方英式英语(BRE)进行了测试,其假设是这种对时变化在美式英语中更为显著。研究人员从 16 位美式英语(USE)和 27 位英式英语(BRE)成年说话者那里收集了实时磁共振成像数据,这些说话者发出的单音节带有尾音 /Vn、Vnd、Vnz/。在 USE 中,V 的鼻化程度较高,N 的鼻化程度较低,而且与 BRE 相比,舌尖变长的程度更高。这些方言显示出类似的 velum 手势稳定性,以及元音鼻化和舌尖变长之间的权衡。在 USE 中,元音对齐的时间并不早。相反,在 USE 中,舌尖峰值速度更接近于舌尖最大值的时间导致 VN 中的声学边界向 N 方向移动,从而产生了在 USE 中茸音对齐更早的错觉。这表明,以舌尖而不是以舌尖为目标的尾音减弱是导致元音鼻化的主要生理机制。
{"title":"The physiological basis of the phonologization of vowel nasalization: A real-time MRI analysis of American and Southern British English","authors":"Conceição Cunha , Phil Hoole , Dirk Voit , Jens Frahm , Jonathan Harrington","doi":"10.1016/j.wocn.2024.101329","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101329","url":null,"abstract":"<div><p>The diachronic change by which coarticulatory nasalization increases in VN (vowel-nasal) sequences has been modelled as an earlier alignment of the velum combined with oral gesture weakening of N. The model was tested by comparing American (USE) and Standard Southern British English (BRE) based on the assumption that this diachronic change is more advanced in USE. Real-time MRI data was collected from 16 USE and 27 BRE adult speakers producing monosyllables with coda /Vn, Vnd, Vnz/. For USE, nasalization was greater in V, less in N, and there was greater tongue tip lenition than for BRE. The dialects showed a similar stability of the velum gesture and a trade-off between vowel nasalization and tongue tip lenition. Velum alignment was not earlier in USE. Instead, a closer approximation of the time of the tongue tip peak velocity towards the tongue tip maximum for USE caused a shift in the acoustic boundary within VN towards N, giving the illusion that the velum gesture has an earlier alignment in USE. It is suggested that coda reduction which targets the tongue tip more than the velum is a principal physiological mechanism responsible for the onset of diachronic vowel nasalization.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"105 ","pages":"Article 101329"},"PeriodicalIF":1.9,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000354/pdfft?md5=a796ba209e07d6d7a77d5ad1e757f23d&pid=1-s2.0-S0095447024000354-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140918808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1016/j.wocn.2024.101323
Don Daniels , Zoë Haupt , Melissa M. Baese-Berk
We provide a phonetic examination of intrusive vowels in Sgi Bara [jil]. These vowels are inserted in predictable places, and their quality (either [i], [ɨ], or [u]) is also predictable, so they are not considered phonemic. We demonstrate that they differ from phonemic vowels in their duration, being shorter; and in their articulation, being more peripheral; but not in their intensity. We then demonstrate how this phonetic understanding of the difference between intrusive and phonemic vowels can be used to answer phonological questions about Sgi Bara. We offer two case studies: phonologically ambiguous sequences of high vowels, and frequent two-word combinations that may be univerbating. The results confirm the existence of a distinction between intrusive and phonemic vowels.
我们对 Sgi Bara [jil] 中的插入元音进行了语音检测。这些元音插入的位置可以预测,其音质([i]、[ɨ]或[u])也可以预测,因此不被视为音位元音。我们证明,它们与音位元音的区别在于持续时间和发音上,前者更短,后者更边缘,但在强度上没有区别。然后,我们演示了如何利用这种对侵入元音和音位元音之间区别的语音理解来回答有关 Sgi Bara 的语音问题。我们提供了两个案例研究:语音上模棱两可的高元音序列和可能是单浊音的频繁双字组合。研究结果证实了侵入元音和音位元音之间存在区别。
{"title":"The phonetics of vowel intrusion in Sgi Bara","authors":"Don Daniels , Zoë Haupt , Melissa M. Baese-Berk","doi":"10.1016/j.wocn.2024.101323","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101323","url":null,"abstract":"<div><p>We provide a phonetic examination of intrusive vowels in Sgi Bara [jil]. These vowels are inserted in predictable places, and their quality (either [i], [ɨ], or [u]) is also predictable, so they are not considered phonemic. We demonstrate that they differ from phonemic vowels in their duration, being shorter; and in their articulation, being more peripheral; but not in their intensity. We then demonstrate how this phonetic understanding of the difference between intrusive and phonemic vowels can be used to answer phonological questions about Sgi Bara. We offer two case studies: phonologically ambiguous sequences of high vowels, and frequent two-word combinations that may be univerbating. The results confirm the existence of a distinction between intrusive and phonemic vowels.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"104 ","pages":"Article 101323"},"PeriodicalIF":1.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000299/pdfft?md5=4ed2ce41979d22264153fa5638e56f22&pid=1-s2.0-S0095447024000299-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140823935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1016/j.wocn.2024.101327
Jieun Lee , Hanyong Park
Experiment 1 investigates whether individual differences in sensitivity to acoustic cues in L1 category perception measured by the Visual Analogue Scaling (VAS) task could explain individual variability in L2 phonological contrast learning [research question (RQ1)]. f0 is a solid cue for Korean three-way stop contrasts (i.e., lenis-aspirated stop distinction) but not for English voicing contrasts. Results showed that naïve English learners of Korean with more gradient performance in the VAS task, which was used as a proxy of f0 cue sensitivity in L1, had an advantage in L2 contrast learning. More gradient learners showed more nativelike f0 utilization during and after the High Variability Phonetic Training (HVPT), suggesting the transfer of L1 acoustic cue sensitivity to L2 learning. Experiment 2 examines whether the cue-attention switching training with L1 stimuli provided before HVPT sessions could aid learners by reallocating their attention away from the L2-irrelevant to the L2-relevant acoustic dimension (RQ2). Results demonstrated the effectiveness of the cue-attention switching training with L1 stimuli, especially to learners with less sensitivity to f0 in the VAS task. This study emphasizes the importance of considering individual differences in L2 training and shows the possibility of utilizing the VAS task as a pretraining assessment to predict the acquisition of L2 phonological contrasts and L2 cue-weighting strategies.
{"title":"Acoustic cue sensitivity in the perception of native category and their relation to nonnative phonological contrast learning","authors":"Jieun Lee , Hanyong Park","doi":"10.1016/j.wocn.2024.101327","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101327","url":null,"abstract":"<div><p>Experiment 1 investigates whether individual differences in sensitivity to acoustic cues in L1 category perception measured by the Visual Analogue Scaling (VAS) task could explain individual variability in L2 phonological contrast learning [research question (RQ1)]. f0 is a solid cue for Korean three-way stop contrasts (i.e., lenis-aspirated stop distinction) but not for English voicing contrasts. Results showed that naïve English learners of Korean with more gradient performance in the VAS task, which was used as a proxy of f0 cue sensitivity in L1, had an advantage in L2 contrast learning. More gradient learners showed more nativelike f0 utilization during and after the High Variability Phonetic Training (HVPT), suggesting the transfer of L1 acoustic cue sensitivity to L2 learning. Experiment 2 examines whether the cue-attention switching training with L1 stimuli provided before HVPT sessions could aid learners by reallocating their attention away from the L2-irrelevant to the L2-relevant acoustic dimension (RQ2). Results demonstrated the effectiveness of the cue-attention switching training with L1 stimuli, especially to learners with less sensitivity to f0 in the VAS task. This study emphasizes the importance of considering individual differences in L2 training and shows the possibility of utilizing the VAS task as a pretraining assessment to predict the acquisition of L2 phonological contrasts and L2 cue-weighting strategies.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"104 ","pages":"Article 101327"},"PeriodicalIF":1.9,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000330/pdfft?md5=298fd21f6b274b949b25732e7a11c234&pid=1-s2.0-S0095447024000330-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140605756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-13DOI: 10.1016/j.wocn.2024.101328
Nicholas B. Aoki, Georgia Zellou
Relative to one’s default (casual) speech, clear speech contains acoustic modifications that are often perceptually beneficial. Clear speech encompasses many different styles, yet most work only compares clear and casual speech as a binary. Furthermore, the term “clear speech” is often unclear − despite variation in elicitation instructions across studies (e.g., speak clearly, imagine an L2-listener or someone with hearing loss, etc.), the generic term “clear speech” is used when interpreting results, under the tacit assumption that clear speech is monolithic. The current study examined the acoustics and intelligibility of casual speech and two clear styles (hard-of-hearing-directed and non-native-directed speech). We find: (1) the clear styles are acoustically distinct (non-native-directed speech is slower with lower mean intensity and f0); (2) the clear styles are perceptually distinct (only hard-of-hearing-directed speech enhances intelligibility); (3) no differences in intelligibility benefits are observed between L1 and L2-listeners. These results underscore the importance of considering the intended interlocutor in speaking style elicitation, leading to a discussion about the issues that arise when reference to “clear speech” lacks clarity. It is suggested that to be more clear about clear speech, greater caution should be taken when interpreting results about speaking style variation.
{"title":"Being clear about clear speech: Intelligibility of hard-of-hearing-directed, non-native-directed, and casual speech for L1- and L2-English listeners","authors":"Nicholas B. Aoki, Georgia Zellou","doi":"10.1016/j.wocn.2024.101328","DOIUrl":"https://doi.org/10.1016/j.wocn.2024.101328","url":null,"abstract":"<div><p>Relative to one’s default (casual) speech, clear speech contains acoustic modifications that are often perceptually beneficial. Clear speech encompasses many different styles, yet most work only compares clear and casual speech as a binary. Furthermore, the term “clear speech” is often <em>unclear</em> − despite variation in elicitation instructions across studies (e.g., speak clearly, imagine an L2-listener or someone with hearing loss, etc.), the generic term “clear speech” is used when interpreting results, under the tacit assumption that clear speech is monolithic. The current study examined the acoustics and intelligibility of casual speech and two clear styles (hard-of-hearing-directed and non-native-directed speech). We find: (1) the clear styles are acoustically distinct (non-native-directed speech is slower with lower mean intensity and f0); (2) the clear styles are perceptually distinct (only hard-of-hearing-directed speech enhances intelligibility); (3) no differences in intelligibility benefits are observed between L1 and L2-listeners. These results underscore the importance of considering the intended interlocutor in speaking style elicitation, leading to a discussion about the issues that arise when reference to “clear speech” lacks clarity. It is suggested that to be more <em>clear</em> about clear speech, greater caution should be taken when interpreting results about speaking style variation.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"104 ","pages":"Article 101328"},"PeriodicalIF":1.9,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0095447024000342/pdfft?md5=bd035ba46dd9b5604519609b4fb5bf11&pid=1-s2.0-S0095447024000342-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140551802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}