首页 > 最新文献

Phonetics and Speech Sciences最新文献

英文 中文
End-to-end non-autoregressive fast text-to-speech 端到端非自回归快速文本到语音
Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.047
Wiback Kim, Hosung Nam
{"title":"End-to-end non-autoregressive fast text-to-speech","authors":"Wiback Kim, Hosung Nam","doi":"10.13064/ksss.2021.13.4.047","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.047","url":null,"abstract":"","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"28 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130214589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study of prosodic features according to the syntactic diversities between children with reading disability and nondisabled children* 阅读障碍儿童与非阅读障碍儿童句法差异的韵律特征比较研究*
Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.055
Sung-Sun Park, Cheol-jae Seong
{"title":"A comparative study of prosodic features according to the syntactic\u0000 diversities between children with reading disability and nondisabled\u0000 children*","authors":"Sung-Sun Park, Cheol-jae Seong","doi":"10.13064/ksss.2021.13.4.055","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.055","url":null,"abstract":"","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126362508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The perception and production of Korean vowels by Egyptian learners* 埃及学习者对韩语元音的感知和产生*
Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.023
S. Benjamin, Ho-Young Lee
This study aims to discuss how Egyptian learners of Korean perceive and categorize Korean vowels, how Koreans perceive Korean vowels they pronounce, and how Egyptian learners’ Korean vowel categorization affects their perception and production of Korean vowels. In Experiment 1, 53 Egyptian learners were asked to listen to Korean test words pronounced by Koreans and choose the words they had listened to among 4 confusable words. In Experiment 2, 117 sound files (13 test words×9 Egyptian learners) recorded by Egyptian learners were given to Koreans and asked to select the words they had heard among 4 confusable words. The results of the experiments show that “new” Korean vowels that do not have categorizable ones in Egyptian Arabic easily formed new categories and were therefore well identified in perception and relatively well pronounced, but some of them were poorly produced. However, Egyptian learners poorly distinguished “similar” Korean vowels in perception, but their pronunciation was relatively well identified by native Koreans. Based on the results of this study, we argued that the Speech Learning Model (SLM) and Perceptual Assimilation Model (PAM) explain the L2 speech perception well, but they are insufficient to explain L2 speech production and therefore need to be revised and extended to L2 speech production.
本研究旨在探讨埃及韩语学习者如何感知和分类韩语元音,韩国人如何感知他们发音的韩语元音,以及埃及学习者的韩语元音分类如何影响他们对韩语元音的感知和产生。实验1要求53名埃及学习者听韩国人发音的韩语测试词,并在4个易混淆的单词中选择自己听过的单词。实验2将埃及学习者录制的117个声音文件(13个测试words×9埃及学习者)交给韩国人,让他们从4个易混淆的单词中选出听过的单词。实验结果表明,在埃及阿拉伯语中没有可分类的朝鲜语元音的“新”元音很容易形成新的类别,因此在感知上被很好地识别,发音也相对较好,但其中一些元音产生得很差。然而,埃及学习者在感知上很难区分“相似”的韩语元音,但他们的发音却被韩国本地人识别得相对较好。基于本研究的结果,我们认为语音学习模型(Speech Learning Model, SLM)和感知同化模型(Perceptual Assimilation Model, PAM)很好地解释了二语语音感知,但它们不足以解释二语语音产生,因此需要修正和扩展到二语语音产生。
{"title":"The perception and production of Korean vowels by Egyptian\u0000 learners*","authors":"S. Benjamin, Ho-Young Lee","doi":"10.13064/ksss.2021.13.4.023","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.023","url":null,"abstract":"This study aims to discuss how Egyptian learners of Korean perceive and categorize Korean vowels, how Koreans perceive Korean vowels they pronounce, and how Egyptian learners’ Korean vowel categorization affects their perception and production of Korean vowels. In Experiment 1, 53 Egyptian learners were asked to listen to Korean test words pronounced by Koreans and choose the words they had listened to among 4 confusable words. In Experiment 2, 117 sound files (13 test words×9 Egyptian learners) recorded by Egyptian learners were given to Koreans and asked to select the words they had heard among 4 confusable words. The results of the experiments show that “new” Korean vowels that do not have categorizable ones in Egyptian Arabic easily formed new categories and were therefore well identified in perception and relatively well pronounced, but some of them were poorly produced. However, Egyptian learners poorly distinguished “similar” Korean vowels in perception, but their pronunciation was relatively well identified by native Koreans. Based on the results of this study, we argued that the Speech Learning Model (SLM) and Perceptual Assimilation Model (PAM) explain the L2 speech perception well, but they are insufficient to explain L2 speech production and therefore need to be revised and extended to L2 speech production.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A longitudinal analysis on interruption in preschool children who stutter during interactions with their mothers* 学龄前口吃儿童与母亲互动中断的纵向分析*
Pub Date : 2021-12-01 DOI: 10.13064/ksss.2021.13.4.075
Hyo-Jung Kwak, Si-Hyeon Hwang, Pu Song, H. Sim, Soo-Bok Lee
The purpose of this study was to investigate, longitudinally, interruption behavior which children who do stutter (CWS), children who do not stutter (CWNS) and their mothers and relationship with disfluency of children showed during interactions with their mothers. Subjects for this study consisted of 2−5 year old CWS (male 2 and female 4), an age-matched group of CWNS (male 3 and female 3), and their mothers. Frequencies of normal disfluency (ND) and abnormal disfluency (AD) in children group and frequency of interruption and simultalk duration in children and mothers group were measured two times (initial visit and 12 months later) over the course of one year. As a result, no significant difference was observed in frequency of interruption and simultalk duration both between two mother groups and between two child groups at initial visit and 12 months later. However, frequency of interruption increased significantly over the course of one year in CWS group. A significant group difference was found in frequency of interruption of mothers but, no significant difference was observed in simultalk duration of mothers at initial visit. In the CWS.mothers group, no factors were related with disfluency of children at initial visit and 12 months later. These findings suggest that interruption is not just negative behavior, and that reducing interruption should be considered in child.parent interaction therapy for CWS.
本研究旨在纵向探讨口吃儿童(CWS)、非口吃儿童(CWNS)及其母亲在与母亲互动过程中表现出的中断行为及其与言语不流利的关系。本研究的研究对象为2 - 5岁的CWNS(男2名,女4名),一组年龄匹配的CWNS(男3名,女3名)及其母亲。在一年的时间里,对儿童组正常不流利(ND)和异常不流利(AD)的频率以及儿童组和母亲组的中断频率和模拟谈话持续时间进行了两次测量(初访和12个月后)。结果,两组母亲和两组儿童在初次访问和12个月后的中断频率和模拟谈话持续时间均无显著差异。然而,在一年的过程中,CWS组的中断频率显著增加。母亲的干扰频率组间差异显著,但母亲首次访视时的模拟谈话时间组间差异不显著。在CWS中。母亲组,在初次访视和12个月后,没有与儿童不流利相关的因素。这些发现表明,干扰不仅仅是消极的行为,在儿童中应该考虑减少干扰。父母互动疗法治疗慢性脑损伤。
{"title":"A longitudinal analysis on interruption in preschool children who\u0000 stutter during interactions with their mothers*","authors":"Hyo-Jung Kwak, Si-Hyeon Hwang, Pu Song, H. Sim, Soo-Bok Lee","doi":"10.13064/ksss.2021.13.4.075","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.4.075","url":null,"abstract":"The purpose of this study was to investigate, longitudinally, interruption behavior which children who do stutter (CWS), children who do not stutter (CWNS) and their mothers and relationship with disfluency of children showed during interactions with their mothers. Subjects for this study consisted of 2−5 year old CWS (male 2 and female 4), an age-matched group of CWNS (male 3 and female 3), and their mothers. Frequencies of normal disfluency (ND) and abnormal disfluency (AD) in children group and frequency of interruption and simultalk duration in children and mothers group were measured two times (initial visit and 12 months later) over the course of one year. As a result, no significant difference was observed in frequency of interruption and simultalk duration both between two mother groups and between two child groups at initial visit and 12 months later. However, frequency of interruption increased significantly over the course of one year in CWS group. A significant group difference was found in frequency of interruption of mothers but, no significant difference was observed in simultalk duration of mothers at initial visit. In the CWS.mothers group, no factors were related with disfluency of children at initial visit and 12 months later. These findings suggest that interruption is not just negative behavior, and that reducing interruption should be considered in child.parent interaction therapy for CWS.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127997182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The f0 distribution of Korean speakers in a spontaneous speech corpus* 自发语料库中朝鲜语使用者的分布*
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.031
Byunggon Yang
The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.
基频是人类语言韵律中一个重要的声学度量。本研究考察了自发语料库的分布,以期为朝鲜语使用者提供规范性数据。该语料库由40位演讲者组成,他们自由地谈论他们的日常活动和个人观点。创建Praat脚本来收集f0值,并且通过观察和收听窄带频谱图上的f0等高线来手动纠正大多数明显的错误。用r对f0的分布进行了统计分析。结果表明,所有韩国语使用者的f0值都是右偏的,呈点状分布。在排除统计异常值的情况下,说话者在274hz (65hz到339hz)的频率范围内自发地说话。共0个数据的模态为102 Hz。雌性组的f0范围比雄性组宽,呈双峰分布。基于年龄和f0值的回归分析得出的r平方值可以忽略不计。由于单个说话人的模态可以通过中位数来预测,因此中位数或模态都可以作为个体f0范围的良好参考。最后,对语调短语的连续f0点进行分析,发现短语的起始段和结束段产生了多个f0测量误差。从这些结果中,我们得出结论,对自发语音语料库的检查可以为语言学家提供有用的措施,以概括个人或群体语言中60个变化的声学特性。需要进一步研究使用统计措施,以确保个别发言者的可靠值。
{"title":"The f0 distribution of Korean speakers in a spontaneous speech\u0000 corpus*","authors":"Byunggon Yang","doi":"10.13064/ksss.2021.13.3.031","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.031","url":null,"abstract":"The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126889654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perception of Japanese word-initial stops by native listeners* 母语听众对日语单词开头停顿的感知*
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.053
Hi-Gyung Byun
It is known that the voicing contrast for Japanese word-initial stops is primarily realized as differences in the voice onset time (VOT). However, recent studies have reported that voiced stops are more often produced with a positive VOT than with a negative VOT among the younger generation nationwide. It is also known that post-stop F0 is associated with the stop contrast, but the degree of F0 use differs from region to region. This study explores whether the difference in post-stop F0 functions as a perceptual cue to the stop contrast along with VOT. Fifty-five college students who are native listeners from four different regions participated in two or three perception tests. The results show that VOT is a primary cue to the voiced-voiceless distinction of word-initial stops, but that the effect of post-stop F0 on the stop contrast is marginal. The post-stop F0 is involved in perception only when VOT is ambiguous, such that a sound with high F0 is more often perceived as a voiceless stop, but not vice versa. The results of this study indicate that the acoustic parameters associated with the stop contrast are not the same in production and perception, and suggest that other factors such as context, which is not an acoustic characteristic, may also be involved in the stop contrast.
日语词头停顿的发声差异主要表现为发声时间(VOT)的差异。然而,最近的研究表明,在全国范围内的年轻一代中,带正音的元音比带负音的元音更常产生浊音顿音。我们还知道,停车后F0与停车对比度有关,但F0的使用程度因地区而异。本研究探讨停止后F0的差异是否与VOT一起作为停止对比度的知觉线索。来自四个不同地区的55名以听力为母语的大学生参加了两到三个感知测试。结果表明,元音是词头顿音区分的主要线索,但顿音后的F0对顿音对比的影响是次要的。顿音后F0只在VOT含糊不清的情况下参与感知,因此F0高的声音更常被认为是不发音的顿音,反之则不然。本研究的结果表明,与顿音对比相关的声学参数在产生和感知上是不一样的,并表明其他因素,如语境,而不是声学特征,也可能涉及顿音对比。
{"title":"Perception of Japanese word-initial stops by native\u0000 listeners*","authors":"Hi-Gyung Byun","doi":"10.13064/ksss.2021.13.3.053","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.053","url":null,"abstract":"It is known that the voicing contrast for Japanese word-initial stops is primarily realized as differences in the voice onset time (VOT). However, recent studies have reported that voiced stops are more often produced with a positive VOT than with a negative VOT among the younger generation nationwide. It is also known that post-stop F0 is associated with the stop contrast, but the degree of F0 use differs from region to region. This study explores whether the difference in post-stop F0 functions as a perceptual cue to the stop contrast along with VOT. Fifty-five college students who are native listeners from four different regions participated in two or three perception tests. The results show that VOT is a primary cue to the voiced-voiceless distinction of word-initial stops, but that the effect of post-stop F0 on the stop contrast is marginal. The post-stop F0 is involved in perception only when VOT is ambiguous, such that a sound with high F0 is more often perceived as a voiceless stop, but not vice versa. The results of this study indicate that the acoustic parameters associated with the stop contrast are not the same in production and perception, and suggest that other factors such as context, which is not an acoustic characteristic, may also be involved in the stop contrast.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131697238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages* 不同方言和年龄朝鲜语停止感知执行功能与提示权重的关系*
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.021
Eun Jong Kong, Hyunjung Lee
The present study investigated how one’s cognitive resources are related to speech perception by examining Korean speakers’ executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t’/ vs. /t/ vs. /t h /). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners’ f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.
本研究通过考察韩国语使用者的执行功能(EF)能力及其与语音开始时间(VOT)的关联,以及识别韩国语喉音停止类别(/t ' / vs. /t/ vs. /t h /)的f0敏感性,探讨了认知资源与言语感知的关系。此前,Kong等人(正在修订中)报告称,首尔和昌原(庆尚北道)的韩国听众(N = 154)在不同教育机构(大学、高中和小学)的方言特定线索权重中表现出不同的群体模式。我们进一步将他们的EF控制(工作记忆、心理灵活性和抑制)与他们的言语感知模式联系起来,以检验更好的认知能力是否会控制对多个声学维度的注意。部分相关分析显示,尽管只有一小部分EF成分被证明是相关的,但韩国听众中较好的EF与对可用声学细节的更大敏感性以及对亚组中不相关声学信息的更大抑制有关。与首尔的参与者不同,庆尚道听众的f0使用与任何EF任务得分无关,反映了方言特定线索的首要性,使用f0作为次要线索。研究结果证实了语言感知和一般认知能力之间的联系,为韩国听众提供了实验证据。
{"title":"Relationship between executive function and cue weighting in Korean\u0000 stop perception across different dialects and ages*","authors":"Eun Jong Kong, Hyunjung Lee","doi":"10.13064/ksss.2021.13.3.021","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.021","url":null,"abstract":"The present study investigated how one’s cognitive resources are related to speech perception by examining Korean speakers’ executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t’/ vs. /t/ vs. /t h /). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners’ f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128661211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of speech motor practice and linguistic complexity on articulation rate in adults who stutter* 言语运动练习和语言复杂性对口吃成人发音率的影响*
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.091
HeeCheong Chon, T. Loucks
This study aimed to investigate speech motor control in adults who stutter (AWS) by testing whether articulation rate changes with practice and linguistic complexity. Eleven AWS and 11 adults who do not stutter (AWNS) repeated four sentences of different lengths and syntactic complexity [simple-short (SS), simple-long (SL), complex-long (CL), and faulty-long (FL) sentences]. Overall articulation rates of each sentence were measured and compared between groups. Practice effects were evaluated by comparing the articulation rates of the first three, middle four, and last three productions. Overall, the AWS had significantly slower articulation rates than AWNS across the four sentences. The longer sentences showed significantly slower articulation rates than the baseline sentence (SS). The articulation rates of the middle four and the last three productions were significantly faster than those of the first three productions of each sentence in both groups. The articulation rates of the SS, SL, and CL sentences indicated a consistent practice effect. The slower articulation rates of the AWS are consistent with a speech motor limitation. There was no interaction with linguistic complexity or practice, so a slower articulation rate may be a general feature of the speech of AWS. Both AWS and AWNS showed practice effects with faster articulation rates which may reflect a degree of adaptation to the stimuli.
本研究旨在通过测试发音速度是否随练习和语言复杂性的变化而变化,来研究成人口吃(AWS)的言语运动控制。11名非口吃儿童和11名非口吃成人重复了4个不同长度和句法复杂程度的句子[简单短(SS)、简单长(SL)、复杂长(CL)和错误长(FL)句]。测量各组之间每个句子的整体发音率并进行比较。通过比较前三部、中四部和后三部作品的发音率来评价练习效果。总的来说,在这四个句子中,AWS的发音速度明显慢于AWS。较长句子的发音速度明显低于基线句子(SS)。两组每个句子的中间四句和后三句的发音速度均显著快于前三句。三种句子的发音率表现出一致的练习效果。AWS较慢的发音速度与言语运动限制一致。这与语言复杂性或实践没有相互作用,因此较慢的发音速度可能是AWS语音的一个普遍特征。AWS和AWNS均表现出更快的发音速度,这可能反映了对刺激的一定程度的适应。
{"title":"Effects of speech motor practice and linguistic complexity on\u0000 articulation rate in adults who stutter*","authors":"HeeCheong Chon, T. Loucks","doi":"10.13064/ksss.2021.13.3.091","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.091","url":null,"abstract":"This study aimed to investigate speech motor control in adults who stutter (AWS) by testing whether articulation rate changes with practice and linguistic complexity. Eleven AWS and 11 adults who do not stutter (AWNS) repeated four sentences of different lengths and syntactic complexity [simple-short (SS), simple-long (SL), complex-long (CL), and faulty-long (FL) sentences]. Overall articulation rates of each sentence were measured and compared between groups. Practice effects were evaluated by comparing the articulation rates of the first three, middle four, and last three productions. Overall, the AWS had significantly slower articulation rates than AWNS across the four sentences. The longer sentences showed significantly slower articulation rates than the baseline sentence (SS). The articulation rates of the middle four and the last three productions were significantly faster than those of the first three productions of each sentence in both groups. The articulation rates of the SS, SL, and CL sentences indicated a consistent practice effect. The slower articulation rates of the AWS are consistent with a speech motor limitation. There was no interaction with linguistic complexity or practice, so a slower articulation rate may be a general feature of the speech of AWS. Both AWS and AWNS showed practice effects with faster articulation rates which may reflect a degree of adaptation to the stimuli.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114648461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-to-speech with linear spectrogram prediction for quality and speed improvement 文本到语音的线性谱图预测质量和速度的提高
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.071
Hyebin Yoon, Hosung Nam
Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.
大多数基于神经网络的语音合成模型利用神经声码器将梅尔尺度的声谱图转换成高质量的、类似人类的声音。然而,神经声码器与mel尺度谱图预测模型相结合,在训练阶段需要相当大的计算机内存和时间,并且在不使用GPU的环境中,推理速度较慢。这个问题在线性谱图预测模型中不会出现,因为它们不使用神经声码器,但这些模型的语音质量很低。作为解决方案,本文提出了一种基于Tacotron 2和transformer的线性谱图预测模型,该模型可以产生高质量的语音,并且不使用神经声码器。实验表明,该模型可以作为高质量的文本到语音模型的基础,具有快速的推理速度。
{"title":"Text-to-speech with linear spectrogram prediction for quality and\u0000 speed improvement","authors":"Hyebin Yoon, Hosung Nam","doi":"10.13064/ksss.2021.13.3.071","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.071","url":null,"abstract":"Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134501267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Korean speakers hyperarticulate vowels in polite speech* 讲韩语的人在礼貌讲话中元音发音非常清晰
Pub Date : 2021-09-01 DOI: 10.13064/ksss.2021.13.3.015
Eunhae Oh, Bodo Winter, K. Idemaru
In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.
根据最近对礼貌的多模态表达的关注,本研究通过分析韩语随意和礼貌语境中产生的元音来研究礼貌言语与声学特征之间的关系。14名以首尔韩语为母语的成年人在两种社会条件下制作了话语,以引出礼貌(教授)和随意(朋友)的话语。测量了7个句子和短语开头单音元音的元音持续时间和第一共振峰(F1)和第二共振峰(F2)。结果表明,礼貌话语与清晰话语中的元音发音在声学上有相似之处:说话者在礼貌话语中比在随意话语中表现出更大的元音空间扩张,以提高感知清晰度。特别是,女性讲话者在礼貌言语中会出现高发音(前)元音,与语速无关。本文进一步讨论了礼貌言语中社会立场的声学编码的含义。
{"title":"Korean speakers hyperarticulate vowels in polite speech*","authors":"Eunhae Oh, Bodo Winter, K. Idemaru","doi":"10.13064/ksss.2021.13.3.015","DOIUrl":"https://doi.org/10.13064/ksss.2021.13.3.015","url":null,"abstract":"In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"152 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131144046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Phonetics and Speech Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1