Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-69
Marie-Anne Morand, M. Bruno, Sandra Schwab, Stephan Schmid
Multiethnolectal ways of speaking have been emerging for 30 years in culturally and linguistically diverse neighborhoods of European cities, including Zurich (Switzerland). Among the prosodic features of Germanic multiethnolects, a so-called ‘staccato’ rhythm has been mentioned in several studies. For instance, a comparison between two groups of adolescents (12 speakers each) showed that speakers of multiethnolectal Zurich German displayed slower syllable rates and less vowel duration variability than speakers of a rather traditional dialect. This study compares syllable rate and speech rhythm metrics ( nPVI-V, nPVI-C ) in spontaneous and read speech of 48 Zurich German adolescents. In a regression analysis, rhythmic measures were compared with the perception of how multiethnolectal the speakers sounded ( rating score ). The results showed that syllable rate and nPVI-V were related to rating score independently of speaking style (read, spontaneous speech): Speakers who were perceived as more multiethnolectal had a slower syllable rate and less vowel duration variability. Such findings were not observed for nPVI-C. These results suggest that syllable rate and speech rhythm (at least, vowel duration variability) are stable phonetic features of multiethnolectal Zurich German, since the relationship between these features and the perception of multiethnolectal speech was observed in both read and spontaneous speech.
{"title":"Syllable rate and speech rhythm in multiethnolectal Zurich German: a comparison of speaking styles","authors":"Marie-Anne Morand, M. Bruno, Sandra Schwab, Stephan Schmid","doi":"10.21437/speechprosody.2022-69","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-69","url":null,"abstract":"Multiethnolectal ways of speaking have been emerging for 30 years in culturally and linguistically diverse neighborhoods of European cities, including Zurich (Switzerland). Among the prosodic features of Germanic multiethnolects, a so-called ‘staccato’ rhythm has been mentioned in several studies. For instance, a comparison between two groups of adolescents (12 speakers each) showed that speakers of multiethnolectal Zurich German displayed slower syllable rates and less vowel duration variability than speakers of a rather traditional dialect. This study compares syllable rate and speech rhythm metrics ( nPVI-V, nPVI-C ) in spontaneous and read speech of 48 Zurich German adolescents. In a regression analysis, rhythmic measures were compared with the perception of how multiethnolectal the speakers sounded ( rating score ). The results showed that syllable rate and nPVI-V were related to rating score independently of speaking style (read, spontaneous speech): Speakers who were perceived as more multiethnolectal had a slower syllable rate and less vowel duration variability. Such findings were not observed for nPVI-C. These results suggest that syllable rate and speech rhythm (at least, vowel duration variability) are stable phonetic features of multiethnolectal Zurich German, since the relationship between these features and the perception of multiethnolectal speech was observed in both read and spontaneous speech.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125939024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-90
Mortaza Taheri-Ardali, D. Hirst
OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages. As part of the OMProDat project, this paper focuses on the creation of a bilingual Persian-English prosodic database read by native speakers of Persian. This collection contains 40 continuous, thematically connected paragraphs, each of five sentences, originally created during the European SAM project. Our collection was recorded by 5 male and 5 female speakers of standard Persian, all from monolingual families. The Persian texts were romanised and transcribed phonetically using the ASCII phonetic alphabet SAMPA. The database includes TextGrid annotations, which will be obtained semi-automatically from the sound and the orthographic transcription using the SPPAS alignment software. The Momel and INSINT algorithms will be used to provide prosodic annotation of the corpus. This considerable amount of data will allow us to compare the production of Persian and English as L1 and L2, respectively. In addition, a cross-linguistic comparison with other languages in OMProDat is easily feasible.
{"title":"Building a Persian-English OMProDat Database Read by Persian Speakers","authors":"Mortaza Taheri-Ardali, D. Hirst","doi":"10.21437/speechprosody.2022-90","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-90","url":null,"abstract":"OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages. As part of the OMProDat project, this paper focuses on the creation of a bilingual Persian-English prosodic database read by native speakers of Persian. This collection contains 40 continuous, thematically connected paragraphs, each of five sentences, originally created during the European SAM project. Our collection was recorded by 5 male and 5 female speakers of standard Persian, all from monolingual families. The Persian texts were romanised and transcribed phonetically using the ASCII phonetic alphabet SAMPA. The database includes TextGrid annotations, which will be obtained semi-automatically from the sound and the orthographic transcription using the SPPAS alignment software. The Momel and INSINT algorithms will be used to provide prosodic annotation of the corpus. This considerable amount of data will allow us to compare the production of Persian and English as L1 and L2, respectively. In addition, a cross-linguistic comparison with other languages in OMProDat is easily feasible.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-61
J. Cole, Jeremy Steffman, Sam Tilsen
In Autosegmental-Metrical models of intonational phonology, pitch accents, phrase accents and boundary tones may combine freely to create a predicted set of phonologically distinct phrase-final “nuclear” tunes. In this study we ask if an 8-way distinction in nuclear tune shape in American English, predicted from combinations of 2 (monotonal) pitch accents, 2 phrase accents and 2 boundary tones, is manifest in speech production and in speech perception. F0 trajectories from an imitative speech production experiment were analyzed using (i) neural net classification, and (ii) human listeners’ perceptual discrimination of the model utterances. Pairwise classification accuracy of the imitative productions is highest for tune pairs that differ in holistic shape (high-rising vs. rise-fall), and poorest for tunes with the same shape that differ in (higher vs. lower) final f0. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0. Together the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape, which only partly aligns with distinctions in tonal specification, and a weak/poorly differentiated distinction between tunes with the same holistic shape but small differences in final f0.
{"title":"Shape matters: Machine classification and listeners’ perceptual discrimination of American English intonational tunes","authors":"J. Cole, Jeremy Steffman, Sam Tilsen","doi":"10.21437/speechprosody.2022-61","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-61","url":null,"abstract":"In Autosegmental-Metrical models of intonational phonology, pitch accents, phrase accents and boundary tones may combine freely to create a predicted set of phonologically distinct phrase-final “nuclear” tunes. In this study we ask if an 8-way distinction in nuclear tune shape in American English, predicted from combinations of 2 (monotonal) pitch accents, 2 phrase accents and 2 boundary tones, is manifest in speech production and in speech perception. F0 trajectories from an imitative speech production experiment were analyzed using (i) neural net classification, and (ii) human listeners’ perceptual discrimination of the model utterances. Pairwise classification accuracy of the imitative productions is highest for tune pairs that differ in holistic shape (high-rising vs. rise-fall), and poorest for tunes with the same shape that differ in (higher vs. lower) final f0. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0. Together the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape, which only partly aligns with distinctions in tonal specification, and a weak/poorly differentiated distinction between tunes with the same holistic shape but small differences in final f0.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"21 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129701349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-85
Wenwei Xu, Chunyu Ge, Wentao Gu, P. Mok
Previous studies have established that phonation contrasts can be, apart from pitch, an important dimension of tonal contrasts in some languages, and modern Wu Chinese is a good example in which the lower register tones are produced with breathier phonation than the upper register tones. Nevertheless, researchers have shown that such phonation contrast is declining among young speakers in Shanghai and Suzhou Wu. This pilot study is thus motivated to investigate children’s production in Kunshan Wu, a neighboring yet rather understudied dialect with more tones, in order to see if a similar trend is ongoing. Two male and two female school-age children (8;4 to 10;4) were recorded reading isolated monosyllabic words with different lexical tones, and simultaneous acoustic and electroglottographic (EGG) data were collected. Results of EGG and acoustic parameters demonstrate that at least near the onset of the vowel, glottal constriction is smaller and glottal closure is less abrupt in the lower register tones than in the upper register tones, suggesting that the lower register tones are generally produced with breathier phonation. Therefore, school-age child speakers of Kunshan Wu are still able to produce the phonation contrast between the tone registers.
{"title":"A preliminary analysis on children’s phonation contrast in Kunshan Wu Chinese tones","authors":"Wenwei Xu, Chunyu Ge, Wentao Gu, P. Mok","doi":"10.21437/speechprosody.2022-85","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-85","url":null,"abstract":"Previous studies have established that phonation contrasts can be, apart from pitch, an important dimension of tonal contrasts in some languages, and modern Wu Chinese is a good example in which the lower register tones are produced with breathier phonation than the upper register tones. Nevertheless, researchers have shown that such phonation contrast is declining among young speakers in Shanghai and Suzhou Wu. This pilot study is thus motivated to investigate children’s production in Kunshan Wu, a neighboring yet rather understudied dialect with more tones, in order to see if a similar trend is ongoing. Two male and two female school-age children (8;4 to 10;4) were recorded reading isolated monosyllabic words with different lexical tones, and simultaneous acoustic and electroglottographic (EGG) data were collected. Results of EGG and acoustic parameters demonstrate that at least near the onset of the vowel, glottal constriction is smaller and glottal closure is less abrupt in the lower register tones than in the upper register tones, suggesting that the lower register tones are generally produced with breathier phonation. Therefore, school-age child speakers of Kunshan Wu are still able to produce the phonation contrast between the tone registers.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-128
Hua Wei, Yifei He, C. Kauschke, Mathias Scharinger, Ulrike Domahs
Previous behavioral studies on the processing of emotional prosody in L2 learners showed similarities and differences between L1- and L2-processing and suggested that emotional perception has both universal and culture-specific aspects. However, little is known about the processing of emotional prosody in L2 learners' brains. Therefore, the present study used event-related potentials to compare the processing of emotional prosodies between German native speakers and Chinese L2 learners of German. Participants performed a prosody recognition task with semantically neutral German words recorded with emotional "neutral" , "like" , and "disgust" prosodies. The accuracy ratings of categorizing emotional prosodies of L2 learners were above chance but significantly better for the L1 speakers. Both groups yielded an early and a late positivity for processing "like" in comparison to "disgust" , reflecting the emotional prosodic predictive processing. However, an early left anterior negativity (ELAN) and a late anterior negativity observed in the L2 learners suggest that they are more sensitive to acoustic differences of the presented stimuli. Overall, our findings support the assumption that the processing of emotional prosody is in principle universal across languages, but that in addition to the general mechanisms involved in the processing of emotional speech language-specific aspects also modify emotional processing.
{"title":"An EEG-study on L2 categorization of emotional prosody in German","authors":"Hua Wei, Yifei He, C. Kauschke, Mathias Scharinger, Ulrike Domahs","doi":"10.21437/speechprosody.2022-128","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-128","url":null,"abstract":"Previous behavioral studies on the processing of emotional prosody in L2 learners showed similarities and differences between L1- and L2-processing and suggested that emotional perception has both universal and culture-specific aspects. However, little is known about the processing of emotional prosody in L2 learners' brains. Therefore, the present study used event-related potentials to compare the processing of emotional prosodies between German native speakers and Chinese L2 learners of German. Participants performed a prosody recognition task with semantically neutral German words recorded with emotional \"neutral\" , \"like\" , and \"disgust\" prosodies. The accuracy ratings of categorizing emotional prosodies of L2 learners were above chance but significantly better for the L1 speakers. Both groups yielded an early and a late positivity for processing \"like\" in comparison to \"disgust\" , reflecting the emotional prosodic predictive processing. However, an early left anterior negativity (ELAN) and a late anterior negativity observed in the L2 learners suggest that they are more sensitive to acoustic differences of the presented stimuli. Overall, our findings support the assumption that the processing of emotional prosody is in principle universal across languages, but that in addition to the general mechanisms involved in the processing of emotional speech language-specific aspects also modify emotional processing.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115926113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-154
Christine T. Röhr, Michelina Savino, M. Grice
This paper uses a serial recall task to investigate the role of rising intonation in the allocation of attentional resources in German. It has been shown for Italian that rising intonation at prosodic boundaries enhances recall of digits in auditorily presented lists. Since resources are usually allocated to prominent items, and since pitch accents are primary encoders of prominence in both languages, we investigate whether an accentual rise leads to better recall than a boundary rise. In a serial recall task on nine-digit sequences in German we compare the effect on working memory of sequences grouped by marking the last item of the two non-final triplets with (i) a high/rising accent followed by an equally high boundary, (ii) a low accent followed by a boundary rise, or (iii) a low/falling accent-boundary sequence, as compared to (iv) ungrouped sequences as controls. Results reveal that items with a rise are recalled more accurately than items without a rise, with no evidence for superior recall of items with accent rises over those with boundary rises. However, boundary rises appear to facilitate recall over a larger domain than accentual rises.
{"title":"The effect of intonational rises on serial recall in German","authors":"Christine T. Röhr, Michelina Savino, M. Grice","doi":"10.21437/speechprosody.2022-154","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-154","url":null,"abstract":"This paper uses a serial recall task to investigate the role of rising intonation in the allocation of attentional resources in German. It has been shown for Italian that rising intonation at prosodic boundaries enhances recall of digits in auditorily presented lists. Since resources are usually allocated to prominent items, and since pitch accents are primary encoders of prominence in both languages, we investigate whether an accentual rise leads to better recall than a boundary rise. In a serial recall task on nine-digit sequences in German we compare the effect on working memory of sequences grouped by marking the last item of the two non-final triplets with (i) a high/rising accent followed by an equally high boundary, (ii) a low accent followed by a boundary rise, or (iii) a low/falling accent-boundary sequence, as compared to (iv) ungrouped sequences as controls. Results reveal that items with a rise are recalled more accurately than items without a rise, with no evidence for superior recall of items with accent rises over those with boundary rises. However, boundary rises appear to facilitate recall over a larger domain than accentual rises.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126692316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-166
Sichang Gao, Mingwei Pan
This study aims to develop a rating scale for evaluating speech prosody of learners of Chinese as a second language (CSL). The researchers first gathered 41 descriptors that were perceived as crucial indicators of prosody ability through interviewing ten CSL teachers, analyzing existing Chinese speaking proficiency scales from five universities in Mainland China. After rating the perception of the selected descriptors by ninety-four CSL teachers and consulting with four expert-teachers, 15 out of 41 descriptors remained to form a rating scale. Principal component analysis revealed that 15 descriptors with three different dimensions (prosodic strategic competence, fluency, prosodic naturalness) could meaningfully describe CSL prosody. Finally, using the 15 descriptors, 29 samples of CSL learners’ speech were evaluated by four raters. A combination of the structural equation modeling and the Many-Facets Rasch modeling confirmed that all the 15 descriptors fit well with the construct of prosody ability measured, demonstrating a good validity of this rating scale.
{"title":"Developing and validating a rating scale of speaking prosody ability for learners of Chinese as a second language","authors":"Sichang Gao, Mingwei Pan","doi":"10.21437/speechprosody.2022-166","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-166","url":null,"abstract":"This study aims to develop a rating scale for evaluating speech prosody of learners of Chinese as a second language (CSL). The researchers first gathered 41 descriptors that were perceived as crucial indicators of prosody ability through interviewing ten CSL teachers, analyzing existing Chinese speaking proficiency scales from five universities in Mainland China. After rating the perception of the selected descriptors by ninety-four CSL teachers and consulting with four expert-teachers, 15 out of 41 descriptors remained to form a rating scale. Principal component analysis revealed that 15 descriptors with three different dimensions (prosodic strategic competence, fluency, prosodic naturalness) could meaningfully describe CSL prosody. Finally, using the 15 descriptors, 29 samples of CSL learners’ speech were evaluated by four raters. A combination of the structural equation modeling and the Many-Facets Rasch modeling confirmed that all the 15 descriptors fit well with the construct of prosody ability measured, demonstrating a good validity of this rating scale.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-68
Kakeru Yazawa, M. Kondo
A wide range of rhythm metrics (global metrics: %V, Δ, Varco, and segVarco; pairwise metrics: rPVI, nPVI, CCI, and D nCCI) was applied to L1 Japanese speakers’ L2 English speech data. Less proficient Japanese speakers of English are expected to show less durational variability for both vocalic and consonantal intervals (because of insufficient stress realization and transfer of CV syllable structure), although this pattern may be obscured by their slower speech rate (which increases interval durations in general). To test if the metrics can capture the L2 rhythmic characteristics, each metric was applied to read speech samples of “The North Wind and the Sun” by 183 Japanese speakers in the J-AESOP corpus. Only %V, VarcoV, and segVarcoV/C were successful; other metrics yielded inconsistent or implausible results likely due to insufficient rate normalization. The overall results indicate that global metrics can effectively quantify L2 rhythm if speech rate is normalized by the mean duration of segments (which is a good predictor of tempo) rather than the mean interval duration (which is popular but susceptible to syllable complexity).
{"title":"A Comparison of Rhythm Metrics for L2 Speech","authors":"Kakeru Yazawa, M. Kondo","doi":"10.21437/speechprosody.2022-68","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-68","url":null,"abstract":"A wide range of rhythm metrics (global metrics: %V, Δ, Varco, and segVarco; pairwise metrics: rPVI, nPVI, CCI, and D nCCI) was applied to L1 Japanese speakers’ L2 English speech data. Less proficient Japanese speakers of English are expected to show less durational variability for both vocalic and consonantal intervals (because of insufficient stress realization and transfer of CV syllable structure), although this pattern may be obscured by their slower speech rate (which increases interval durations in general). To test if the metrics can capture the L2 rhythmic characteristics, each metric was applied to read speech samples of “The North Wind and the Sun” by 183 Japanese speakers in the J-AESOP corpus. Only %V, VarcoV, and segVarcoV/C were successful; other metrics yielded inconsistent or implausible results likely due to insufficient rate normalization. The overall results indicate that global metrics can effectively quantify L2 rhythm if speech rate is normalized by the mean duration of segments (which is a good predictor of tempo) rather than the mean interval duration (which is popular but susceptible to syllable complexity).","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125986412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-171
Liang Zhao, Shayne Sloggett, Eleanor Chodroff
Speech processing involves active integration of bottom-up and top-down information types. In the present study, we investigated the relative weighting of top-down expectedness and bottom-up lexical tone in the perception of familiar and unfamiliar lexical tone systems. Standard Mandarin and Chengdu Mandarin are mutually intelligible language varieties with comparable segmental and highly distinct tonal realizations. In a spoken semantic-plausibility judgment task, we manipulated whether a word was high-surprisal or low-surprisal given the preceding context and dialect-specific tone. All participants were native Standard Mandarin speakers with minimal Chengdu Mandarin experience. Lower judgment accuracy was observed when the stimulus was Chengdu Mandarin, and suggested that expectedness (i.e., top-down) information overrides tonal (i.e., bottom-up) information in sentence plausibility judgments. However, judgment response times to sentence surprisal were uniform across stimuli from both dialects, suggesting that speakers are aware of the surprisal conveyed by a non-standard tone, even if not used in their final decision. These findings reveal listener sensitivity to both top-down expectedness and bottom-up tone regardless of the initial tone reliability. For unfamiliar tone systems, top-down influence overrides bottom-up processing to access utterance meaning, but bottom-up processing is indeed present and may reflect rapid learning of the unfamiliar tone system.
{"title":"Top-Down and Bottom-up Processing of Familiar and Unfamiliar Mandarin Dialect Tone Systems","authors":"Liang Zhao, Shayne Sloggett, Eleanor Chodroff","doi":"10.21437/speechprosody.2022-171","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-171","url":null,"abstract":"Speech processing involves active integration of bottom-up and top-down information types. In the present study, we investigated the relative weighting of top-down expectedness and bottom-up lexical tone in the perception of familiar and unfamiliar lexical tone systems. Standard Mandarin and Chengdu Mandarin are mutually intelligible language varieties with comparable segmental and highly distinct tonal realizations. In a spoken semantic-plausibility judgment task, we manipulated whether a word was high-surprisal or low-surprisal given the preceding context and dialect-specific tone. All participants were native Standard Mandarin speakers with minimal Chengdu Mandarin experience. Lower judgment accuracy was observed when the stimulus was Chengdu Mandarin, and suggested that expectedness (i.e., top-down) information overrides tonal (i.e., bottom-up) information in sentence plausibility judgments. However, judgment response times to sentence surprisal were uniform across stimuli from both dialects, suggesting that speakers are aware of the surprisal conveyed by a non-standard tone, even if not used in their final decision. These findings reveal listener sensitivity to both top-down expectedness and bottom-up tone regardless of the initial tone reliability. For unfamiliar tone systems, top-down influence overrides bottom-up processing to access utterance meaning, but bottom-up processing is indeed present and may reflect rapid learning of the unfamiliar tone system.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126205707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-26
J. Kruyt, S. Benus, C. Faget, C. Lançon, M. Champagne-Lavau
Entrainment refers to the tendency people have to speak more similarly during a conversation. Although entrainment has been observed frequently, the underlying mechanisms of the phe-nomenon are debated. A specific point of disagreement is the role of social or higher-order cognitive factors in entrainment. The present study aimed to explore prosodic and lexical entrainment in small groups of individuals with schizophrenia, a dis-order that has been associated with theory of mind impairments and social difficulties, and a control group without schizophrenia. All participants completed a referential communication task with an experimenter. To determine prosodic entrainment, the measures proposed by Levitan and Hirshberg [1] were used. Results seem to suggest that the effect of task role on prosodic entrainment was larger than any possible effects of group, suggesting that social factors affect prosodic entrainment behaviour more than individual differences in cognition or other factors. Conversely, lexical entrainment was not affected by task role or group. Importantly, no clear patterns in entrainment on different dimensions, levels, or features could be observed, highlighting the complex and multifaceted nature of entrainment.
{"title":"Prosodic and lexical entrainment in adults with and without schizophrenia","authors":"J. Kruyt, S. Benus, C. Faget, C. Lançon, M. Champagne-Lavau","doi":"10.21437/speechprosody.2022-26","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-26","url":null,"abstract":"Entrainment refers to the tendency people have to speak more similarly during a conversation. Although entrainment has been observed frequently, the underlying mechanisms of the phe-nomenon are debated. A specific point of disagreement is the role of social or higher-order cognitive factors in entrainment. The present study aimed to explore prosodic and lexical entrainment in small groups of individuals with schizophrenia, a dis-order that has been associated with theory of mind impairments and social difficulties, and a control group without schizophrenia. All participants completed a referential communication task with an experimenter. To determine prosodic entrainment, the measures proposed by Levitan and Hirshberg [1] were used. Results seem to suggest that the effect of task role on prosodic entrainment was larger than any possible effects of group, suggesting that social factors affect prosodic entrainment behaviour more than individual differences in cognition or other factors. Conversely, lexical entrainment was not affected by task role or group. Importantly, no clear patterns in entrainment on different dimensions, levels, or features could be observed, highlighting the complex and multifaceted nature of entrainment.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121455741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}