Previous studies on the prosodic marking of information status argue that Italian tends to resist deaccentuation of given elements. In particular, Italian reportedly always accents post-focal given information within noun phrases (NPs), so that it is not possible to reliably reconstruct the information status of the items from the acoustic signal. However, descriptions have so far been concerned with categorical accent patterns, lacking crucial information about continuous phonetic parameters and their distribution in the utterance in ways that can contribute to prosodic marking. In this paper, we use a novel approach based on periodic-energy-related measures to explore how speakers of the Neapolitan variety of Italian modulate continuous prosodic parameters to differentiate information structure. We show that, contrary to previous findings, Italian speakers of the Neapolitan variety do mark information status prosodically within noun phrases. The discrepancy with previous work is explained by the fact that the prosodic marking of post-focal givenness is not achieved through the categorical presence or absence of a pitch accent on one specific syllable, but through the gradual modulation of phonetic parameters at various locations. Moreover, we find that these modulations occur early in the noun phrase. We also show that native speakers can make use of their knowledge of these modulations to reliably identify post-focal given elements in the absence of the pragmatic context, that is, directly from the acoustic signal.
{"title":"Prosodic marking of information status in Italian","authors":"Simona Sbranna, Caterina Ventura, Aviad Albert, Martine Grice","doi":"10.1016/j.wocn.2023.101212","DOIUrl":"https://doi.org/10.1016/j.wocn.2023.101212","url":null,"abstract":"<div><p>Previous studies on the prosodic marking of information status argue that Italian tends to resist deaccentuation of given elements. In particular, Italian reportedly always accents post-focal given information within noun phrases (NPs), so that it is not possible to reliably reconstruct the information status of the items from the acoustic signal. However, descriptions have so far been concerned with categorical accent patterns, lacking crucial information about continuous phonetic parameters and their distribution in the utterance in ways that can contribute to prosodic marking. In this paper, we use a novel approach based on periodic-energy-related measures to explore how speakers of the Neapolitan variety of Italian modulate continuous prosodic parameters to differentiate information structure. We show that, contrary to previous findings, Italian speakers of the Neapolitan variety do mark information status prosodically within noun phrases. The discrepancy with previous work is explained by the fact that the prosodic marking of post-focal givenness is not achieved through the categorical presence or absence of a pitch accent on one specific syllable, but through the gradual modulation of phonetic parameters at various locations. Moreover, we find that these modulations occur early in the noun phrase. We also show that native speakers can make use of their knowledge of these modulations to reliably identify post-focal given elements in the absence of the pragmatic context, that is, directly from the acoustic signal.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101212"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49864738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1016/j.wocn.2023.101222
Yi Zheng , Arthur G. Samuel
Perceptual stability is obviously advantageous, but being able to adjust to the prevailing environment is also adaptive. Previous research has identified ways in which the categorization of speech sounds shifts as a function of recently heard speech. Dozens of studies have examined “lexically driven recalibration”, an adjustment to categorization after listeners hear a number of words with a particular speech sound designed to be perceptually ambiguous. Despite the large number of these studies, little is known about how long the adjustment endures. Using two different stimulus sets, we assess the recovery time after lexically driven recalibration. In addition, we examine whether the size of the recalibration effect diminishes during the identification test used to measure it, and whether the recalibration effect is stronger for one side of a tested contrast or the other. The effect did in fact decline during its measurement, and one side of the contrast (/s/) produced stronger shifts than others (/ʃ/ or /θ/) under the conditions typically examined in recalibration studies. Recalibration was quite robust after 24 hours for both stimulus sets, and still measurable after one week for one of them. This time course is strikingly different than the recovery times reported in previous studies for two other adjustment processes – selective adaptation and audiovisually driven recalibration. The vastly different time courses pose a major challenge for models that ascribe these phenomena to the same adjustment function. Thus, such models will need to be substantially modified, or alternative models will need to be developed.
{"title":"Flexibility and stability of speech sounds: The time course of lexically-driven recalibration","authors":"Yi Zheng , Arthur G. Samuel","doi":"10.1016/j.wocn.2023.101222","DOIUrl":"https://doi.org/10.1016/j.wocn.2023.101222","url":null,"abstract":"<div><p>Perceptual stability is obviously advantageous, but being able to adjust to the prevailing environment is also adaptive. Previous research has identified ways in which the categorization of speech sounds shifts as a function of recently heard speech. Dozens of studies have examined “lexically driven recalibration”, an adjustment to categorization after listeners hear a number of words with a particular speech sound designed to be perceptually ambiguous. Despite the large number of these studies, little is known about how long the adjustment endures. Using two different stimulus sets, we assess the recovery time after lexically driven recalibration. In addition, we examine whether the size of the recalibration effect diminishes during the identification test used to measure it, and whether the recalibration effect is stronger for one side of a tested contrast or the other. The effect did in fact decline during its measurement, and one side of the contrast (/s/) produced stronger shifts than others (/ʃ/ or /θ/) under the conditions typically examined in recalibration studies. Recalibration was quite robust after 24 hours for both stimulus sets, and still measurable after one week for one of them. This time course is strikingly different than the recovery times reported in previous studies for two other adjustment processes – selective adaptation and audiovisually driven recalibration. The vastly different time courses pose a major challenge for models that ascribe these phenomena to the same adjustment function. Thus, such models will need to be substantially modified, or alternative models will need to be developed.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101222"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49816843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1016/j.wocn.2023.101223
Luis M.T. Jesus , Sara Castilho , Aníbal Ferreira , Maria Conceição Costa
Purpose
The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.
Method
Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ʃ/.
Results
Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were estimated to be lower for whispered speech when compared to voiced speech. Fricatives’ broad peak frequency was statistically significant when discriminating between /s/ and /ʃ/.
Conclusions
First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The relative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform rehabilitation strategies that can safely explore the production mechanisms of whispering.
{"title":"Discriminative segmental cues to vowel height and consonantal place and voicing in whispered speech","authors":"Luis M.T. Jesus , Sara Castilho , Aníbal Ferreira , Maria Conceição Costa","doi":"10.1016/j.wocn.2023.101223","DOIUrl":"https://doi.org/10.1016/j.wocn.2023.101223","url":null,"abstract":"<div><h3>Purpose</h3><p>The acoustic signal attributes of whispered speech potentially carry sufficiently distinct information to define vowel spaces and to disambiguate consonant place and voicing, but what these attributes are and the underlying production mechanisms are not fully known. The purpose of this study was to define segmental cues to place and voicing of vowels and sibilant fricatives and to develop an articulatory interpretation of acoustic data.</p></div><div><h3>Method</h3><p>Seventeen speakers produced sustained sibilants and oral vowels, disyllabic words, sentences and read a phonetically balanced text. All the tasks were repeated in voiced and whispered speech, and the sound source and filter analysed using the following parameters: Fundamental frequency, spectral peak frequencies and levels, spectral slopes, sound pressure level and durations. Logistic linear mixed-effects models were developed to understand what acoustic signal attributes carry sufficiently distinct information to disambiguate /i, a/ and /s, ʃ/.</p></div><div><h3>Results</h3><p>Vowels were produced with significantly different spectral slope, sound pressure level, first and second formant frequencies in voiced and whispered speech. The low frequencies spectral slope of voiced sibilants was significantly different between whispered and voiced speech. The odds of choosing /a/ instead of /i/ were estimated to be lower for whispered speech when compared to voiced speech. Fricatives’ broad peak frequency was statistically significant when discriminating between /s/ and /ʃ/.</p></div><div><h3>Conclusions</h3><p>First formant frequency and relative duration of vowels are consistently used as height cues, and spectral slope and broad peak frequency are attributes associated with consonantal place of articulation. The relative duration of same-place voiceless fricatives was higher than voiced fricatives both in voiced and whispered speech. The evidence presented in this paper can be used to restore voiced speech signals, and to inform rehabilitation strategies that can safely explore the production mechanisms of whispering.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101223"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49816846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1016/j.wocn.2023.101213
Valerie Freeman
Vowel merger production is quantified with gradient acoustic measures, while phonemic perception methods are often coarser, complicating comparisons within mergers in progress. This study implements a perception experiment in two-dimensional formant space (F1 × F2), allowing unified plotting, quantification, and statistics with production data. Production and perception are compared within 20 speakers for a two-part prevelar merger in progress in Pacific Northwest English, where mid-front /ɛ, e/ approximate or merge before voiced velar /ɡ/ (leg–vague merger), and low-front prevelar /æɡ/ raises toward them (bag-raising). Distributions are visualized with kernel density plots and overlap quantified with Pillai scores and confusion matrices from linear discriminant analysis models. Results suggest that leg–vague merger is perceived as more complete than it is produced (in both the sample and community), while bag-raising is highly variable in production but rejected in perception. Relationships between production and perception varied by age, with raising and merger progressing across two generations in production but not perception, followed by younger adults perceiving leg–vague merger but not producing it and varying in (minimal) raising perception while varying in bag-raising in production. Thus, prevelar raising/merger may be progressing among some social groups but reversing in others.
{"title":"Production and perception of prevelar merger: Two-dimensional comparisons using Pillai scores and confusion matrices","authors":"Valerie Freeman","doi":"10.1016/j.wocn.2023.101213","DOIUrl":"10.1016/j.wocn.2023.101213","url":null,"abstract":"<div><p>Vowel merger production is quantified with gradient acoustic measures, while phonemic perception methods are often coarser, complicating comparisons within mergers in progress. This study implements a perception experiment in two-dimensional formant space (F1 × F2), allowing unified plotting, quantification, and statistics with production data. Production and perception are compared within 20 speakers for a two-part prevelar merger in progress in Pacific Northwest English, where mid-front /ɛ, e/ approximate or merge before voiced velar /ɡ/ (<span>leg–vague</span> merger), and low-front prevelar /æɡ/ raises toward them (<span>bag-</span>raising). Distributions are visualized with kernel density plots and overlap quantified with Pillai scores and confusion matrices from linear discriminant analysis models. Results suggest that <span>leg–vague</span> merger is perceived as more complete than it is produced (in both the sample and community), while <span>bag-</span>raising is highly variable in production but rejected in perception. Relationships between production and perception varied by age, with raising and merger progressing across two generations in production but not perception, followed by younger adults perceiving <span>leg–vague</span> merger but not producing it and varying in (minimal) raising perception while varying in <span>bag</span>-raising in production. Thus, prevelar raising/merger may be progressing among some social groups but reversing in others.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101213"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9879351/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10576296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1016/j.wocn.2022.101210
Zhe-chen Guo, Rajka Smiljanic
Overlap of adjacent articulatory gestures leads to coarticulation. Understanding how hyperarticulated intelligibility-enhancing clear speech modifications affect coarticulation can inform theories of phonetic variation and speech intelligibility. However, prior research yielded mixed findings regarding the relationship between hyperarticulation and coarticulatory patterns. This study extends previous work by analyzing the degree of coarticulation across several different communicative conditions in the LUCID corpus (Baker & Hazan, 2010). Southern British English speakers completed an interactive spot-the-difference task with a partner with and without a communicative barrier (e.g., speech degraded by talker babble). They also read sentences without an interlocutor casually and clearly. Diphones in keywords produced in both tasks were analyzed using two whole-spectrum measures, with greater spectral distance and shorter coarticulatory overlap between the diphones indexing less coarticulation. Results revealed that speakers coarticulated less in response to both real (interactive task) and imaginary (sentence-reading) communicative challenges. Speakers furthermore varied the degree of coarticulatory resistance in different real communicative barriers. Diphones with greater consonant articulatory constraint were less sensitive to differences between the conditions, suggesting a limit to the hyperarticulation-induced phonetic variation. The findings agree with the models of targeted speaker adaptations assuming coarticulatory resistance in hyperarticulated clear speech (the H&H theory: Lindblom, 1990).
{"title":"Speakers coarticulate less in response to both real and imagined communicative challenges: An acoustic analysis of the LUCID corpus","authors":"Zhe-chen Guo, Rajka Smiljanic","doi":"10.1016/j.wocn.2022.101210","DOIUrl":"https://doi.org/10.1016/j.wocn.2022.101210","url":null,"abstract":"<div><p>Overlap of adjacent articulatory gestures leads to coarticulation. Understanding how hyperarticulated intelligibility-enhancing clear speech modifications affect coarticulation can inform theories of phonetic variation and speech intelligibility. However, prior research yielded mixed findings regarding the relationship between hyperarticulation and coarticulatory patterns. This study extends previous work by analyzing the degree of coarticulation across several different communicative conditions in the LUCID corpus (<span>Baker & Hazan, 2010</span>). Southern British English speakers completed an interactive spot-the-difference task with a partner with and without a communicative barrier (e.g., speech degraded by talker babble). They also read sentences without an interlocutor casually and clearly. Diphones in keywords produced in both tasks were analyzed using two whole-spectrum measures, with greater spectral distance and shorter coarticulatory overlap between the diphones indexing less coarticulation. Results revealed that speakers coarticulated less in response to both real (interactive task) and imaginary (sentence-reading) communicative challenges. Speakers furthermore varied the degree of coarticulatory resistance in different real communicative barriers. Diphones with greater consonant articulatory constraint were less sensitive to differences between the conditions, suggesting a limit to the hyperarticulation-induced phonetic variation. The findings agree with the models of targeted speaker adaptations assuming coarticulatory resistance in hyperarticulated clear speech (the H&H theory: <span>Lindblom, 1990</span>).</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101210"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49816840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1016/j.wocn.2023.101224
Vincent Hughes , Amanda Cardoso , Paul Foulkes , Peter French , Amelia Gully , Philip Harrison
This study examines the extent to which speaker-specific information is encoded in different features of vocal output and the relationships between those features. A range of acoustic features, grouped as source (laryngeal voice quality measures and fundamental frequency) and filter features (formants and Mel-frequency cepstral coefficients; MFCCs), were extracted from the vocalic portion of the hesitation marker um for 90 male speakers of Standard Southern British English. Little overall correlation between the sets of features was observed, suggesting no strong interdependence between source and filter in our data. Although filter features were consistently better at discriminating between same- and different-speaker pairs compared with source features, combining source and filter has the potential of producing the lowest error rates and the strongest speaker discrimination scores. Taken together, results show that source and filter provide complementary speaker-specific information. However, the extent of the improvements in speaker discrimination performance when combining source and filter varied across speakers. We explore potential explanations for this finding and discuss the implications for source-filter theory, and for applied fields such as speaker recognition and forensic speech science.
{"title":"Speaker-specificity in speech production: The contribution of source and filter","authors":"Vincent Hughes , Amanda Cardoso , Paul Foulkes , Peter French , Amelia Gully , Philip Harrison","doi":"10.1016/j.wocn.2023.101224","DOIUrl":"https://doi.org/10.1016/j.wocn.2023.101224","url":null,"abstract":"<div><p>This study examines the extent to which speaker-specific information is encoded in different features of vocal output and the relationships between those features. A range of acoustic features, grouped as source (laryngeal voice quality measures and fundamental frequency) and filter features (formants and Mel-frequency cepstral coefficients; MFCCs), were extracted from the vocalic portion of the hesitation marker <em>um</em> for 90 male speakers of Standard Southern British English. Little overall correlation between the sets of features was observed, suggesting no strong interdependence between source and filter in our data. Although filter features were consistently better at discriminating between same- and different-speaker pairs compared with source features, combining source and filter has the potential of producing the lowest error rates and the strongest speaker discrimination scores. Taken together, results show that source and filter provide complementary speaker-specific information. However, the extent of the improvements in speaker discrimination performance when combining source and filter varied across speakers. We explore potential explanations for this finding and discuss the implications for source-filter theory, and for applied fields such as speaker recognition and forensic speech science.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"97 ","pages":"Article 101224"},"PeriodicalIF":1.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49816844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.wocn.2022.101181
Anita Lorenc , Marzena Żygis , Łukasz Mik , Daniel Pape , Márton Sóskuthy
The present paper investigates articulatory and acoustic variation in Polish palatalised retroflex sibilants compared with their plain counterparts. It tests the hypothesis advanced by Hamann (2003: 44) that palatalised retroflexes are non-existent and that retroflexes in Polish change to palato-alveolars [ʃ ʒ t͡ʃ d͡ʒ] when being palatalised. Based on articulatory data from 20 speakers we provide evidence that at least part of the data (53.5%) are palatalised retroflexes [ʂʲ ʐʲ ʈ͡ʂʲ ɖ͡ʐʲ]. The plain counterparts are shown to be retroflex, as proposed by Hamann (2003).
Our averaged results indicate that both palatalised and plain retroflexes show a convex tongue shape. However, individual data reveals a wide range of realisations, from a bunched dorsum to flat and even hollowed tongue shapes. Taking this variability into account, we propose a new tongue shape classification based on Heron’s Formula – i.e. concave, slightly concave, flat, convex and slightly convex. The different tongue shapes are also visualised in the form of videos created using GAMMs.
Regarding acoustic results, our analysis reveals that the strongest correlate of palatalised retroflex sibilants is longer duration of frication in palatalised sibilants followed by higher Centre of Gravity (COG) and m1 spectral slope.
{"title":"Articulatory and acoustic variation in Polish palatalised retroflexes compared with plain ones","authors":"Anita Lorenc , Marzena Żygis , Łukasz Mik , Daniel Pape , Márton Sóskuthy","doi":"10.1016/j.wocn.2022.101181","DOIUrl":"https://doi.org/10.1016/j.wocn.2022.101181","url":null,"abstract":"<div><p>The present paper investigates articulatory and acoustic variation in Polish palatalised retroflex sibilants compared with their plain counterparts. It tests the hypothesis advanced by Hamann (2003: 44) that palatalised retroflexes are non-existent and that retroflexes in Polish change to palato-alveolars [ʃ ʒ t͡ʃ d͡ʒ] when being palatalised. Based on articulatory data from 20 speakers we provide evidence that at least part of the data (53.5%) are palatalised retroflexes [ʂʲ ʐʲ ʈ͡ʂʲ ɖ͡ʐʲ]. The plain counterparts are shown to be retroflex, as proposed by Hamann (2003).</p><p>Our averaged results indicate that both palatalised and plain retroflexes show a convex tongue shape. However, individual data reveals a wide range of realisations, from a bunched dorsum to flat and even hollowed tongue shapes. Taking this variability into account, we propose a new tongue shape classification based on Heron’s Formula – i.e. concave, slightly concave, flat, convex and slightly convex. The different tongue shapes are also visualised in the form of videos created using GAMMs.</p><p>Regarding acoustic results, our analysis reveals that the strongest correlate of palatalised retroflex sibilants is longer duration of frication in palatalised sibilants followed by higher Centre of Gravity (COG) and m1 spectral slope.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"96 ","pages":"Article 101181"},"PeriodicalIF":1.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49754776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.wocn.2022.101198
Uriel Cohen Priva, Emily Strand
Is American English schwa’s position determined solely by the context in which it appears? Do vowels neutralize to schwa when their duration is shorter? We address these two inter-related questions using the Buckeye corpus to study vowel behavior across multiple contexts of spontaneous speech. We find that all except tense high vowels shift to lower F1 values when their duration is relatively short, including lax high vowels and lexical schwas, rather than toward a mid-vowel position that schwa occupies when its duration is long. However, we also replicate the finding that schwa is more dependent on both context and duration than other vowels. The results are not consistent with the idea that schwa’s position is determined exclusively by the context in which it appears. However, schwa’s shift to higher F1 values when its duration is longer is not necessarily different from other vowels’ shift to higher F1 values when their duration is longer, making it unnecessary to argue that schwa’s mid-vowel properties are due to having a target in F1 terms.
{"title":"Schwa’s duration and acoustic position in American English","authors":"Uriel Cohen Priva, Emily Strand","doi":"10.1016/j.wocn.2022.101198","DOIUrl":"https://doi.org/10.1016/j.wocn.2022.101198","url":null,"abstract":"<div><p>Is American English schwa’s position determined solely by the context in which it appears? Do vowels neutralize to schwa when their duration is shorter? We address these two inter-related questions using the Buckeye corpus to study vowel behavior across multiple contexts of spontaneous speech. We find that all except tense high vowels shift to lower F1 values when their duration is relatively short, including lax high vowels and lexical schwas, rather than toward a mid-vowel position that schwa occupies when its duration is long. However, we also replicate the finding that schwa is more dependent on both context and duration than other vowels. The results are not consistent with the idea that schwa’s position is determined exclusively by the context in which it appears. However, schwa’s shift to higher F1 values when its duration is longer is not necessarily different from other vowels’ shift to higher F1 values when their duration is longer, making it unnecessary to argue that schwa’s mid-vowel properties are due to having a target in F1 terms.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"96 ","pages":"Article 101198"},"PeriodicalIF":1.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49760260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.wocn.2022.101200
Constantijn Kaland , Marc Swerts , Nikolaus P. Himmelmann
The prosody of Papuan Malay, spoken in the easternmost provinces of Indonesia, is not fully described and understood. The limited work available suggests that phrase prosody in this language is different from other well-studied (West-Germanic) languages. However, not much is known about possible correlates of focus marking, for which prosody is used extensively in languages like Dutch and English. To gain insight into universal and specific usages of prosody, this study reports two identical production experiments and acoustic analyses carried out for Papuan Malay and Dutch, to investigate the prosody of noun phrases in different contrastive focus conditions. Participants in the experiments described pictures with different shapes and colors using specific matrix phrases. The prosody of these descriptions was examined by time-series measures of f0 and statistically analysed using generalised additive mixed models (GAMMs). Results show that speakers of Papuan Malay do not use f0 to mark contrastively focused noun phrases, unlike Dutch speakers. The main function of f0 in Papuan Malay phrases appears to be boundary marking on the final syllable in the phrase, a function also observed in Dutch. In addition, the pre-final syllable in the Papuan Malay phrase was always marked with a rising f0, whereas in Dutch an interaction between the boundary and focus marking was found. The results are discussed in a typological perspective and provide new insights into the prosody of Papuan Malay.
{"title":"Red and blue bananas: Time-series f0 analysis of contrastively focused noun phrases in Papuan Malay and Dutch","authors":"Constantijn Kaland , Marc Swerts , Nikolaus P. Himmelmann","doi":"10.1016/j.wocn.2022.101200","DOIUrl":"https://doi.org/10.1016/j.wocn.2022.101200","url":null,"abstract":"<div><p>The prosody of Papuan Malay, spoken in the easternmost provinces of Indonesia, is not fully described and understood. The limited work available suggests that phrase prosody in this language is different from other well-studied (West-Germanic) languages. However, not much is known about possible correlates of focus marking, for which prosody is used extensively in languages like Dutch and English. To gain insight into universal and specific usages of prosody, this study reports two identical production experiments and acoustic analyses carried out for Papuan Malay and Dutch, to investigate the prosody of noun phrases in different contrastive focus conditions. Participants in the experiments described pictures with different shapes and colors using specific matrix phrases. The prosody of these descriptions was examined by time-series measures of f0 and statistically analysed using generalised additive mixed models (GAMMs). Results show that speakers of Papuan Malay do not use f0 to mark contrastively focused noun phrases, unlike Dutch speakers. The main function of f0 in Papuan Malay phrases appears to be boundary marking on the final syllable in the phrase, a function also observed in Dutch. In addition, the pre-final syllable in the Papuan Malay phrase was always marked with a rising f0, whereas in Dutch an interaction between the boundary and focus marking was found. The results are discussed in a typological perspective and provide new insights into the prosody of Papuan Malay.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"96 ","pages":"Article 101200"},"PeriodicalIF":1.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49754732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.1016/j.wocn.2022.101199
Juqiang Chen , Mark Antoniou , Catherine T. Best
The present study examined native language phonological and phonetic factors in non-native lexical tone perception by tone language listeners, manipulating memory load and stimulus variability to bias listeners towards a more phonological or more phonetic mode of perception. Mandarin and Vietnamese listeners categorised the five Thai lexical tones to their native tones, and discriminated five selected Thai tone contrasts that were predicted by the Perceptual Assimilation Model (PAM, Best, 1995) to be discriminated differently. Categorisation responses showed more phonologically-based patterns under high than low memory load but were unaffected by talker and vowel variability, whereas discrimination accuracy was reduced by talker and vowel variability but not by memory load. Phonological factors indicated by type of categorisation and category overlap generally predicted the discrimination of non-native tone contrasts in line with PAM principles. Phonetic factors reflected in category overlap scores and fit index difference scores predicted variations in discriminating contrasts of the same contrast categorisation type. These findings uphold the extension of PAM principles to non-native tone perception by native listeners of other tone languages. Native phonological and phonetic contributions to non-native speech perception differ between categorisation and discrimination tasks, as reflected in differential modulation by memory load and stimulus variability.
{"title":"Phonological and phonetic contributions to perception of non-native lexical tones by tone language listeners: Effects of memory load and stimulus variability","authors":"Juqiang Chen , Mark Antoniou , Catherine T. Best","doi":"10.1016/j.wocn.2022.101199","DOIUrl":"https://doi.org/10.1016/j.wocn.2022.101199","url":null,"abstract":"<div><p>The present study examined native language phonological and phonetic factors in non-native lexical tone perception by tone language listeners, manipulating memory load and stimulus variability to bias listeners towards a more phonological or more phonetic mode of perception. Mandarin and Vietnamese listeners categorised the five Thai lexical tones to their native tones, and discriminated five selected Thai tone contrasts that were predicted by the Perceptual Assimilation Model (PAM, <span>Best, 1995</span>) to be discriminated differently. Categorisation responses showed more phonologically-based patterns under high than low memory load but were unaffected by talker and vowel variability, whereas discrimination accuracy was reduced by talker and vowel variability but not by memory load. Phonological factors indicated by type of categorisation and category overlap generally predicted the discrimination of non-native tone contrasts in line with PAM principles. Phonetic factors reflected in category overlap scores and fit index difference scores predicted variations in discriminating contrasts of the same contrast categorisation type. These findings uphold the extension of PAM principles to non-native tone perception by native listeners of other tone languages. Native phonological and phonetic contributions to non-native speech perception differ between categorisation and discrimination tasks, as reflected in differential modulation by memory load and stimulus variability.</p></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"96 ","pages":"Article 101199"},"PeriodicalIF":1.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49760261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}