Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-142
Jianjing Kuang, May Pik Yu Chan, Nari Rhee
This study investigates how the perception of prosodic boundaries is shaped by syntactic phrasing and acoustic cues for English and Mandarin listeners. Syntactically-parsed speech corpora were used as the stimuli for the perception experiment. The relative strength of the syntactic boundary of both the left and right sides of the constituents was extracted from the syntactic parsing annotations. A wide range of acoustic cues of both prosodic domain-final and domain-initial positions were examined. Linear-mixed-effects modeling of the likelihood of boundary perception suggests that, for both languages, prosodic boundary perception was influenced by both the strength of syntactic boundary and acoustic cues: boundary perception was heavily driven by the presence of pause; pause also modulated the contribution of other acoustic cues; and larger syntactic boundaries were generally more likely to be perceived as prosodic boundaries. However, there is also cross-linguistic variation: the effect of syntactic phrasing cues was generally stronger for English; acoustically, the effect of final lengthening and pitch reset was stronger in English, while pause was the dominant cue in Mandarin. We discuss the important implica-tions of these findings related to the nature of prosodic hierar-chy, and the nature of the prosody-syntax interface.
{"title":"The effects of syntactic and acoustic cues on the perception of prosodic boundaries","authors":"Jianjing Kuang, May Pik Yu Chan, Nari Rhee","doi":"10.21437/speechprosody.2022-142","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-142","url":null,"abstract":"This study investigates how the perception of prosodic boundaries is shaped by syntactic phrasing and acoustic cues for English and Mandarin listeners. Syntactically-parsed speech corpora were used as the stimuli for the perception experiment. The relative strength of the syntactic boundary of both the left and right sides of the constituents was extracted from the syntactic parsing annotations. A wide range of acoustic cues of both prosodic domain-final and domain-initial positions were examined. Linear-mixed-effects modeling of the likelihood of boundary perception suggests that, for both languages, prosodic boundary perception was influenced by both the strength of syntactic boundary and acoustic cues: boundary perception was heavily driven by the presence of pause; pause also modulated the contribution of other acoustic cues; and larger syntactic boundaries were generally more likely to be perceived as prosodic boundaries. However, there is also cross-linguistic variation: the effect of syntactic phrasing cues was generally stronger for English; acoustically, the effect of final lengthening and pitch reset was stronger in English, while pause was the dominant cue in Mandarin. We discuss the important implica-tions of these findings related to the nature of prosodic hierar-chy, and the nature of the prosody-syntax interface.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124522828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-138
A. Benítez‐Burraco, Wendy Elvira-García
Human self-domestication refers to a new evolutionary hypothesis. According to this view, humans have experienced changes that are similar to those observed in domesticated mammals and that have provided us with many of the behavioural and perhaps cognitive pre-requisites for supporting our social practices and advanced culture. At the core of this hypothesis is the claim that self-domestication is triggered by a reduction in reactive aggression. Since the findings of increased complexity in the communicative signals of domesticated animals compared to their wild conspecific, the human self-domestication hypothesis has been used to account for the sophistication of the grammars of human languages. Nonetheless, less research has been done in the domain of phonology. In this talk, we apply this evolutionary model to the evolution of human prosody, arguing for a progressive complexification of prosody that parallels (and is triggered by) the complexification of grammar, also in response to a reduction in reactive aggression levels. Two different types of evidence support our claim: the parallel complexification of prosody and grammar found in emerging sign languages and the parallel sophistication of prosody and grammar during language acquisition, which in turn parallels an increased control over the mechanisms involved in reactive aggression.
{"title":"Human self-domestication and the evolution of prosody","authors":"A. Benítez‐Burraco, Wendy Elvira-García","doi":"10.21437/speechprosody.2022-138","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-138","url":null,"abstract":"Human self-domestication refers to a new evolutionary hypothesis. According to this view, humans have experienced changes that are similar to those observed in domesticated mammals and that have provided us with many of the behavioural and perhaps cognitive pre-requisites for supporting our social practices and advanced culture. At the core of this hypothesis is the claim that self-domestication is triggered by a reduction in reactive aggression. Since the findings of increased complexity in the communicative signals of domesticated animals compared to their wild conspecific, the human self-domestication hypothesis has been used to account for the sophistication of the grammars of human languages. Nonetheless, less research has been done in the domain of phonology. In this talk, we apply this evolutionary model to the evolution of human prosody, arguing for a progressive complexification of prosody that parallels (and is triggered by) the complexification of grammar, also in response to a reduction in reactive aggression levels. Two different types of evidence support our claim: the parallel complexification of prosody and grammar found in emerging sign languages and the parallel sophistication of prosody and grammar during language acquisition, which in turn parallels an increased control over the mechanisms involved in reactive aggression.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114415414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-55
Anna Bruggeman, Leonie Schade, M. Wlodarczak, P. Wagner
Much of the existing research on prominence perception has focused on read speech in American English and German. The present paper presents two experiments that build on and extend insights from these studies in two ways. Firstly, we elicit prominence judgments on spontaneous speech. Secondly, we investigate gradient rather than binary prominence judgments by introducing a finger tapping task. We then provide a within-participant comparison of gradient prominence results with binary prominence judgments to evaluate their correspondence. Our results show that participants exhibit different success rates in tapping the prominence pattern of spontaneous data, but generally tapping results correlate well with binary prominence judgments within individuals. Random forest analyses of the acoustic parameters involved show that pitch accentuation and duration play important roles in both binary judgments and prominence tapping patterns. We can also confirm earlier findings from read speech that differences exist between participants in the relative importance rankings of various signal and systematic properties.
{"title":"Beware of the individual: Evaluating prominence perception in spontaneous speech","authors":"Anna Bruggeman, Leonie Schade, M. Wlodarczak, P. Wagner","doi":"10.21437/speechprosody.2022-55","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-55","url":null,"abstract":"Much of the existing research on prominence perception has focused on read speech in American English and German. The present paper presents two experiments that build on and extend insights from these studies in two ways. Firstly, we elicit prominence judgments on spontaneous speech. Secondly, we investigate gradient rather than binary prominence judgments by introducing a finger tapping task. We then provide a within-participant comparison of gradient prominence results with binary prominence judgments to evaluate their correspondence. Our results show that participants exhibit different success rates in tapping the prominence pattern of spontaneous data, but generally tapping results correlate well with binary prominence judgments within individuals. Random forest analyses of the acoustic parameters involved show that pitch accentuation and duration play important roles in both binary judgments and prominence tapping patterns. We can also confirm earlier findings from read speech that differences exist between participants in the relative importance rankings of various signal and systematic properties.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-63
Kathryn Franich, Hermann Keupdjio
Evidence continues to accrue suggesting that co-speech gestures form an integrated part of the prosodic system of languages. Several studies have highlighted a tight link between the timing of gestures of the hands and head with syllables bearing prosodic prominence. Most work to date has examined this relationship in Indo-European languages, where gestures appear to be crucially timed with respect to pitch-accented syllables. Less work has examined the timing of co-speech gestures in tonal languages, where pitch plays quite a different role within the phonological system. Here, we examine the influence of tone on the timing of manual co-speech gestures in Medmba, a Grassfields Bantu language spoken in Cameroon. We investigate 1) whether certain tones are more likely than others to associate with manual gestures in the language; and 2) whether the fine timing of the speech-gesture relationship is influenced by the tone or relative fundamental frequency ( f 0 ) of the syllable it co-occurs with. Our findings indicated no preference for any one tone to occur with co-speech gestures. However, gesture apexes were found to align significantly later with respect to the accompanying syllable’s vowel for low-toned syllables as compared with syllables of other tones.
{"title":"The Influence of Tone on the Alignment of Speech and Co-Speech Gesture","authors":"Kathryn Franich, Hermann Keupdjio","doi":"10.21437/speechprosody.2022-63","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-63","url":null,"abstract":"Evidence continues to accrue suggesting that co-speech gestures form an integrated part of the prosodic system of languages. Several studies have highlighted a tight link between the timing of gestures of the hands and head with syllables bearing prosodic prominence. Most work to date has examined this relationship in Indo-European languages, where gestures appear to be crucially timed with respect to pitch-accented syllables. Less work has examined the timing of co-speech gestures in tonal languages, where pitch plays quite a different role within the phonological system. Here, we examine the influence of tone on the timing of manual co-speech gestures in Medmba, a Grassfields Bantu language spoken in Cameroon. We investigate 1) whether certain tones are more likely than others to associate with manual gestures in the language; and 2) whether the fine timing of the speech-gesture relationship is influenced by the tone or relative fundamental frequency ( f 0 ) of the syllable it co-occurs with. Our findings indicated no preference for any one tone to occur with co-speech gestures. However, gesture apexes were found to align significantly later with respect to the accompanying syllable’s vowel for low-toned syllables as compared with syllables of other tones.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"17 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116863930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-32
E. Thurgood, Paul Olejarczuk
This study focuses on the interaction between tone and intonation on the prosodic realization of the sentence-final particle nyei³³ in Iu-Mien, a Hmong-Mien language spoken in parts of China and Southeast Asia. While intonation patterns of questions in colloquial Iu-Mien, in which sentence-final particle nyei³³ does not typically occur, have been described, intonation patterns with the sentence-final nyei³³ used in less colloquial settings have not been analyzed yet. Our study aims to fill this gap. Using data from five female speakers, we show that the mid-level tone 33 of nyei³³ is preserved when in the final position of statements, but surfaces as a rising or falling contour at the end of yes-no questions. In addition, we find coarticulatory effects of the preceding tone on the F0 contour of the particle.
{"title":"The Effects of Intonation on the Sentence-Final Particle nyei in Iu-Mien","authors":"E. Thurgood, Paul Olejarczuk","doi":"10.21437/speechprosody.2022-32","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-32","url":null,"abstract":"This study focuses on the interaction between tone and intonation on the prosodic realization of the sentence-final particle nyei³³ in Iu-Mien, a Hmong-Mien language spoken in parts of China and Southeast Asia. While intonation patterns of questions in colloquial Iu-Mien, in which sentence-final particle nyei³³ does not typically occur, have been described, intonation patterns with the sentence-final nyei³³ used in less colloquial settings have not been analyzed yet. Our study aims to fill this gap. Using data from five female speakers, we show that the mid-level tone 33 of nyei³³ is preserved when in the final position of statements, but surfaces as a rising or falling contour at the end of yes-no questions. In addition, we find coarticulatory effects of the preceding tone on the F0 contour of the particle.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116889624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-21
Jill C. Thorson, Jill M. Trumbell, Kimberly D. Nesbitt
Prosody is used to express information structure and status differences in American English. For this study, our motivation was to analyze these abilities during an ecologically valid interaction where we traded control for more natural spontaneous speech. We ask how children package information when playing with their parents during exhibit exploration in a children’s museum. Specifically, we employed a MAE_ToBI analysis to look at the production of new and given information status differences during these interactions. Parent-child dyads were recorded while playing in a museum exhibit at a children’s museum. Preliminary analyses were conducted on one 4-year-old, one 5-year-old, and one 6-year-old speaker. As predicted, we found a particular set of pitch accents to be commonly found as well as considerable variation in nuclear configuration patterns due to pragmatic effects. While pitch accent types largely stayed the same over the three ages analyzed to date, the H+!H* pitch accent was only found in the speech of the 4-year-old speaker. These data continue to add to the knowledge of how pitch accent selection relates to both information status and the pragmatics of the discourse.
{"title":"Expressing information status through prosody in the spontaneous speech of American English-speaking children","authors":"Jill C. Thorson, Jill M. Trumbell, Kimberly D. Nesbitt","doi":"10.21437/speechprosody.2022-21","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-21","url":null,"abstract":"Prosody is used to express information structure and status differences in American English. For this study, our motivation was to analyze these abilities during an ecologically valid interaction where we traded control for more natural spontaneous speech. We ask how children package information when playing with their parents during exhibit exploration in a children’s museum. Specifically, we employed a MAE_ToBI analysis to look at the production of new and given information status differences during these interactions. Parent-child dyads were recorded while playing in a museum exhibit at a children’s museum. Preliminary analyses were conducted on one 4-year-old, one 5-year-old, and one 6-year-old speaker. As predicted, we found a particular set of pitch accents to be commonly found as well as considerable variation in nuclear configuration patterns due to pragmatic effects. While pitch accent types largely stayed the same over the three ages analyzed to date, the H+!H* pitch accent was only found in the speech of the 4-year-old speaker. These data continue to add to the knowledge of how pitch accent selection relates to both information status and the pragmatics of the discourse.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-141
S. Stehwien, Lars Meyer
Speech is perceived as a sequence of meaningful units of various lengths, from phones to phrases. Prosody is one of the means by which these are segmented: Prosodic boundaries sub-divide utterances into prosodic phrases. In this corpus study, we study prosodic boundaries from a neurolinguistic perspective. To be perceived correctly, prosodic phrases must obey neurobiological constraints. In particular, electrophysiological processing has been argued to operate periodically, with one electrophysiological processing cycle being devoted to the processing of exactly one prosodic phrase. We thus hypothesized that prosodic phrases as such should show periodicity. We assess the DIRNDL corpus of German radio news, which has been annotated for intonational and intermediate phrases. We find that sequences of 2–5 intermediate phrases are periodic at 0.8–1.6 Hertz within their superordinate intonation phrase. Across utterances, the duration of intermediate phrases alternates with the duration of superordinate intonation phrases, indicating a dependence of prosodic time scales. While the determinants of periodicity are unknown, the results are compatible with an asso-ciation between periodic electrophysiological processing mechanisms and the rhythm of prosody. This contributes to closing the gap between the the neurobiology of language and linguistic description.
{"title":"Short-Term Periodicity of Prosodic Phrasing: Corpus-based Evidence","authors":"S. Stehwien, Lars Meyer","doi":"10.21437/speechprosody.2022-141","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-141","url":null,"abstract":"Speech is perceived as a sequence of meaningful units of various lengths, from phones to phrases. Prosody is one of the means by which these are segmented: Prosodic boundaries sub-divide utterances into prosodic phrases. In this corpus study, we study prosodic boundaries from a neurolinguistic perspective. To be perceived correctly, prosodic phrases must obey neurobiological constraints. In particular, electrophysiological processing has been argued to operate periodically, with one electrophysiological processing cycle being devoted to the processing of exactly one prosodic phrase. We thus hypothesized that prosodic phrases as such should show periodicity. We assess the DIRNDL corpus of German radio news, which has been annotated for intonational and intermediate phrases. We find that sequences of 2–5 intermediate phrases are periodic at 0.8–1.6 Hertz within their superordinate intonation phrase. Across utterances, the duration of intermediate phrases alternates with the duration of superordinate intonation phrases, indicating a dependence of prosodic time scales. While the determinants of periodicity are unknown, the results are compatible with an asso-ciation between periodic electrophysiological processing mechanisms and the rhythm of prosody. This contributes to closing the gap between the the neurobiology of language and linguistic description.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-37
Julia Bongiorno, Sophie Herment
High Rising Terminals, Uptalk, or Upspeak, are stylistic rises that can be found at the end of declarative statements. They have been studied in numerous varieties of English and in other languages too. It has been shown that these rises can take on different phonetic and phonological forms and convey various pragmatic functions depending on the varieties in which they are found. The present study provides a description of these forms and functions in Dublin (Republic of Ireland). Based on a corpus of 5 speakers from the PAC-Dublin corpus that was recorded in the Irish capital in 2018, the study shows that HRTs are mainly realized with late rises and nuclear rises and that they are different from interrogative and continuative rises, notably because they are steeper than the latter. A sociolinguistic analysis of our corpus also shows that the gender of the speakers has an influence on the occurrence of the phenomenon, which does not seem to be the case for age range. This article thus provides a multidimensional analysis of stylistic rising tones in statements in Dublin.
{"title":"High Rising Terminals in Dublin: forms, functions and gender","authors":"Julia Bongiorno, Sophie Herment","doi":"10.21437/speechprosody.2022-37","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-37","url":null,"abstract":"High Rising Terminals, Uptalk, or Upspeak, are stylistic rises that can be found at the end of declarative statements. They have been studied in numerous varieties of English and in other languages too. It has been shown that these rises can take on different phonetic and phonological forms and convey various pragmatic functions depending on the varieties in which they are found. The present study provides a description of these forms and functions in Dublin (Republic of Ireland). Based on a corpus of 5 speakers from the PAC-Dublin corpus that was recorded in the Irish capital in 2018, the study shows that HRTs are mainly realized with late rises and nuclear rises and that they are different from interrogative and continuative rises, notably because they are steeper than the latter. A sociolinguistic analysis of our corpus also shows that the gender of the speakers has an influence on the occurrence of the phenomenon, which does not seem to be the case for age range. This article thus provides a multidimensional analysis of stylistic rising tones in statements in Dublin.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129701054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-28
Heete Sahkai, Eva Liina Asu, P. Lippus
This paper presents a comparison of the prosodic characteristics of canonical questions with two types of non-canonical interrogative utterances in Estonian. The data consisted of string-identical interrogative sentences with the question-word kuidas (‘how’) elicited in three readings: information-seeking question (ISQ), rhetorical question (RQ) and surprise question (SQ). A three-way distinction between the three utterance types emerged. First, there was a binary distinction between canonical and non-canonical questions in mean pitch, utterance duration and voice quality: non-canonical questions were characterised by lower mean pitch, longer duration and a larger proportion of non-modal (creaky) voice quality. Second, there was a three-way distinction in pitch range: ISQs had the narrowest and SQs the widest pitch range while RQs were in-between the two. Third, SQs were further distinguished from ISQs and RQs by a different placement of focal accent and the accentuation of pronouns. There were, however, no differences in intonational pitch accent types and boundary tones between the three utterance types.Theresults imply that the lower mean pitch signals the indirect illocutionary force of the non-canonical questions while the longer duration, non-modal voice quality and larger pitch range indicate their affective nature. SQs are additionally associated with a specific information structure.
{"title":"Prosodic characteristics of canonical and non-canonical questions in Estonian","authors":"Heete Sahkai, Eva Liina Asu, P. Lippus","doi":"10.21437/speechprosody.2022-28","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-28","url":null,"abstract":"This paper presents a comparison of the prosodic characteristics of canonical questions with two types of non-canonical interrogative utterances in Estonian. The data consisted of string-identical interrogative sentences with the question-word kuidas (‘how’) elicited in three readings: information-seeking question (ISQ), rhetorical question (RQ) and surprise question (SQ). A three-way distinction between the three utterance types emerged. First, there was a binary distinction between canonical and non-canonical questions in mean pitch, utterance duration and voice quality: non-canonical questions were characterised by lower mean pitch, longer duration and a larger proportion of non-modal (creaky) voice quality. Second, there was a three-way distinction in pitch range: ISQs had the narrowest and SQs the widest pitch range while RQs were in-between the two. Third, SQs were further distinguished from ISQs and RQs by a different placement of focal accent and the accentuation of pronouns. There were, however, no differences in intonational pitch accent types and boundary tones between the three utterance types.Theresults imply that the lower mean pitch signals the indirect illocutionary force of the non-canonical questions while the longer duration, non-modal voice quality and larger pitch range indicate their affective nature. SQs are additionally associated with a specific information structure.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127082948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.21437/speechprosody.2022-45
Christina Tånnander, D. House, Jens Edlund
Recent advances in deep-learning have pushed text-to-speech synthesis (TTS) very close to human speech. In deep-learning, latent features refer to features that are hidden from us; notwithstanding, we may meaningfully observe their effects. Analogously, latent prosodic features refer to the exact features that constitute e.g. prominence that are unknown to us, although we know (some of) the functions of prominence and (some of) its acoustic correlates. Deep-learned speech models capture prosody well but leave us with little control and few insights. Previously, we explored average syllable duration on word level - a simple and accessible metric - as a proxy for prominence: in Swedish TTS, where verb particles and numerals tend to receive too little prominence, these were nudged towards lengthening while allowing the TTS models to otherwise operate freely. Listener panels overwhelmingly preferred the nudged versions to the unmodified TTS. In this paper, we analyze utterances from the modified TTS. The analysis shows that duration-nudging of relevant words changes the following features in an observable manner: duration is predictably lengthened, word-initial glottalization occurs, and the general intonation pattern changes. This supports the view of latent prosodic features that can be reflected in deep-learned models and accessed by proxy.
{"title":"Syllable duration as a proxy to latent prosodic features","authors":"Christina Tånnander, D. House, Jens Edlund","doi":"10.21437/speechprosody.2022-45","DOIUrl":"https://doi.org/10.21437/speechprosody.2022-45","url":null,"abstract":"Recent advances in deep-learning have pushed text-to-speech synthesis (TTS) very close to human speech. In deep-learning, latent features refer to features that are hidden from us; notwithstanding, we may meaningfully observe their effects. Analogously, latent prosodic features refer to the exact features that constitute e.g. prominence that are unknown to us, although we know (some of) the functions of prominence and (some of) its acoustic correlates. Deep-learned speech models capture prosody well but leave us with little control and few insights. Previously, we explored average syllable duration on word level - a simple and accessible metric - as a proxy for prominence: in Swedish TTS, where verb particles and numerals tend to receive too little prominence, these were nudged towards lengthening while allowing the TTS models to otherwise operate freely. Listener panels overwhelmingly preferred the nudged versions to the unmodified TTS. In this paper, we analyze utterances from the modified TTS. The analysis shows that duration-nudging of relevant words changes the following features in an observable manner: duration is predictably lengthened, word-initial glottalization occurs, and the general intonation pattern changes. This supports the view of latent prosodic features that can be reflected in deep-learned models and accessed by proxy.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127275174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}