首页 > 最新文献

Phonetics and Speech Sciences最新文献

英文 中文
How does focus-induced prominence modulate phonetic realizations for Korean word-medial stops? 焦点诱发的突出如何调节韩语中间词缀的语音实现?
Pub Date : 2020-12-01 DOI: 10.13064/KSSS.2020.12.4.057
Jiyoun Choi
Previous research has indicated that the patterns of phonetic modulations induced by prominence are not consistent across languages but are conditioned by sound systems specific to a given language. Most studies examining the prominence effects in Korean have been restricted to segments in word-initial and phrase-initial positions. The present study, thus, set out to explore the prominence effects for Korean stop consonants in word-medial intervocalic positions. A total of 16 speakers of Seoul Korean (8 males, 8 females) produced word-medial intervocalic lenis and aspirated stops with and without prominence. The prominence was induced by contrast focus on the phonation-type contrast, that is, lenis vs. aspirated stops. Our results showed that F0 of vowels following both lenis and aspirated stops became higher when the target stops received focus than when they did not, whereas voice onset time (VOT) and voicing during stop closure for both lenis and aspirated stops did not differ between the focus and no-focus conditions. The findings add to our understanding of diverse patterns of prominence-induced strengthening on the acoustic realizations of segments.
先前的研究表明,由突出音引起的语音调制模式在不同语言之间并不一致,而是受到特定语言的声音系统的制约。大多数关于韩语突出效应的研究都局限于词头和短语开头位置的词段。因此,本研究旨在探讨韩语中间辅音在单词中间位置的突出效应。共有16名首尔语使用者(8名男性,8名女性)发出了带有突出音和不带有突出音的中间音和送气停顿音。突出是由对发音型对比的对比焦点引起的,即连音与送气停顿。我们的研究结果表明,当目标音停被聚焦时,元音的F0值比目标音停未被聚焦时高,而在目标音停被聚焦和不被聚焦的情况下,目标音停和吸气音停后的发音时间(VOT)和发声时间在目标音停被聚焦和不被聚焦的情况下没有差异。这些发现增加了我们对不同模式的日珥诱导的增强对段的声学实现的理解。
{"title":"How does focus-induced prominence modulate phonetic realizations for\u0000 Korean word-medial stops?","authors":"Jiyoun Choi","doi":"10.13064/KSSS.2020.12.4.057","DOIUrl":"https://doi.org/10.13064/KSSS.2020.12.4.057","url":null,"abstract":"Previous research has indicated that the patterns of phonetic modulations induced by prominence are not consistent across languages but are conditioned by sound systems specific to a given language. Most studies examining the prominence effects in Korean have been restricted to segments in word-initial and phrase-initial positions. The present study, thus, set out to explore the prominence effects for Korean stop consonants in word-medial intervocalic positions. A total of 16 speakers of Seoul Korean (8 males, 8 females) produced word-medial intervocalic lenis and aspirated stops with and without prominence. The prominence was induced by contrast focus on the phonation-type contrast, that is, lenis vs. aspirated stops. Our results showed that F0 of vowels following both lenis and aspirated stops became higher when the target stops received focus than when they did not, whereas voice onset time (VOT) and voicing during stop closure for both lenis and aspirated stops did not differ between the focus and no-focus conditions. The findings add to our understanding of diverse patterns of prominence-induced strengthening on the acoustic realizations of segments.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115796229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm* 肌张力性语音障碍(MTD)女性语音和正常语音的倒谱变量和随机森林算法分类*
Pub Date : 2020-12-01 DOI: 10.13064/KSSS.2020.12.4.091
Joowon Yun, Hee-Jeong Shim, Cheol-jae Seong
This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSV ™ . The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.
本研究利用基于倒谱的声学变量研究了肌张力性语音障碍(MTD)患者发出的持续元音/a/和句子的声学特征。36名诊断为MTD的女性和相同数量的正常声音的女性参加了研究,数据由ADSV™记录和测量。结果表明,各变量中倒谱峰突出值(CPP)和CPP_F0均显著低于对照组。在GRBAS量表中,MTD患者的语音质量总体严重程度(G)最为突出,粗糙度(R)、呼吸度(B)、应变(S)指标次之。随着这些特征的增加,在CPP中观察到统计学上显著的负相关。我们尝试用CPP和CPP_F0变量对MTD和对照组进行分类。通过随机森林机器学习算法的统计建模,在句子阅读任务中发现了更高的分类准确率(训练数据为100%,测试数据为83.3%),证明了CPP在元音和句子阅读任务中都起着更重要的作用。
{"title":"Classification of muscle tension dysphonia (MTD) female speech and\u0000 normal speech using cepstrum variables and random forest algorithm*","authors":"Joowon Yun, Hee-Jeong Shim, Cheol-jae Seong","doi":"10.13064/KSSS.2020.12.4.091","DOIUrl":"https://doi.org/10.13064/KSSS.2020.12.4.091","url":null,"abstract":"This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSV ™ . The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Perceptual cues for /o/ and /u/ in Seoul Korean 首尔韩文/o/和/u/的知觉线索
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.001
Hi-Gyung Byun
Previous studies have confirmed that /o/ and /u/ in Seoul Korean are undergoing a merger in the F1/F2 space, especially for female speakers. As a substitute parameter for formants, it is reported that female speakers use phonation (H1-H2) differences to distinguish /o/ from /u/. This study aimed to explore whether H1-H2 values are being used as perceptual cues for /o/-/u/. A perception test was conducted with 35 college students using /o/ and /u/ spoken by 41 females, which overlap considerably in the vowel space. An acoustic analysis of 182 stimuli was also conducted to see if there is any correspondence between production and perception. The identification rate was 89% on average, 86% for /o/, and 91% for /u/. The results confirmed that when /o/ and /u/ cannot be distinguished in the F1/F2 space because they are too close, H1-H2 differences contribute significantly to the separation of the two vowels. However, in perception, this was not the case. H1-H2 values were not significantly involved in the identification process, and the formants (especially F2) were still dominant cues. The study also showed that even though H1-H2 differences are apparent in females' production, males do not use H1-H2 in their production, and both females and males do not use H1-H2 in their perception. It is presumed that H1-H2 has not yet been developed as a perceptual cue for /o/ and /u/.
先前的研究已经证实,首尔韩语中的/o/和/u/在F1/F2空间中正在发生合并,尤其是对女性来说。作为共振峰的替代参数,据报道,女性说话者使用发音(H1-H2)差异来区分/o/和/u/。本研究旨在探讨H1-H2值是否被用作/o/-/u/的感知线索。用41位女性发音的/o/和/u/对35名大学生进行了感知测试,这两个音在元音空间上有很大的重叠。研究人员还对182种刺激进行了声学分析,以确定产生和感知之间是否存在对应关系。平均识别率为89%,/o/为86%,/u/为91%。结果证实,当/o/和/u/在F1/F2空间中由于距离太近而无法区分时,H1-H2的差异对两个元音的分离有显著影响。然而,在知觉上,情况并非如此。H1-H2值在识别过程中没有显著影响,共振峰(尤其是F2)仍然是显性线索。研究还表明,尽管雌虫产生H1-H2的差异很明显,但雄虫不使用H1-H2,雌虫和雄虫都不使用H1-H2进行感知。假设H1-H2尚未发展为/o/和/u/的知觉提示。
{"title":"Perceptual cues for /o/ and /u/ in Seoul Korean","authors":"Hi-Gyung Byun","doi":"10.13064/ksss.2020.12.3.001","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.001","url":null,"abstract":"Previous studies have confirmed that /o/ and /u/ in Seoul Korean are undergoing a merger in the F1/F2 space, especially for female speakers. As a substitute parameter for formants, it is reported that female speakers use phonation (H1-H2) differences to distinguish /o/ from /u/. This study aimed to explore whether H1-H2 values are being used as perceptual cues for /o/-/u/. A perception test was conducted with 35 college students using /o/ and /u/ spoken by 41 females, which overlap considerably in the vowel space. An acoustic analysis of 182 stimuli was also conducted to see if there is any correspondence between production and perception. The identification rate was 89% on average, 86% for /o/, and 91% for /u/. The results confirmed that when /o/ and /u/ cannot be distinguished in the F1/F2 space because they are too close, H1-H2 differences contribute significantly to the separation of the two vowels. However, in perception, this was not the case. H1-H2 values were not significantly involved in the identification process, and the formants (especially F2) were still dominant cues. The study also showed that even though H1-H2 differences are apparent in females' production, males do not use H1-H2 in their production, and both females and males do not use H1-H2 in their perception. It is presumed that H1-H2 has not yet been developed as a perceptual cue for /o/ and /u/.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114252451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compromised feature normalization method for deep neural network based speech recognition* 基于深度神经网络的语音识别折衷特征归一化方法*
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.065
M. Kim, H. S. Kim
Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.
特征归一化是通过对声学特征参数的统计特征进行归一化,减少训练条件和测试条件之间环境不匹配的影响。在传统的基于高斯混合模型-隐马尔可夫模型(GMM-HMM)的语音识别系统的基础上,得到了显著的性能提升。然而,在基于深度神经网络(DNN)的语音识别系统中,将环境不匹配的影响最小化并不一定会导致最佳的性能提高。在本文中,我们将这种现象的原因归结为过度特征归一化导致的信息丢失。我们研究是否存在一种特征归一化方法,通过适当减少环境不匹配的影响来最大化语音识别性能,同时为训练声学模型保留有用的信息。为此,我们引入均值和指数方差归一化(MEVN),这是均值归一化(MN)和均值和方差归一化(MVN)之间的折衷,并根据方差归一化程度比较了基于dnn的语音识别系统在噪声和混响环境下的性能。实验结果表明,根据方差归一化的程度,MEVN比MN和MVN的性能有轻微的提高。
{"title":"Compromised feature normalization method for deep neural network\u0000 based speech recognition*","authors":"M. Kim, H. S. Kim","doi":"10.13064/ksss.2020.12.3.065","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.065","url":null,"abstract":"Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A prosodic cue representing scopes of wh-phrases in Korean: Focusing on North Gyeongsang Korean* 代表韩国语“wh”短语范围的韵律线索——以庆北韩国语为例*
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.041
Weonhee Yun, Ki-tae Kim, Sunwoo Park
A wh-phrase in an embedded sentence may have either an embedded or a matrix scope. Interpretation of a wh-phrase with a matrix scope has tended to be syntactically unacceptable unless the sentence reads with a wh-intonation. Previous studies have found two differences in prosodic characteristics between sentences with matrix and embedded scopes. Firstly, peak F0s in wh-phrases produced with an F0 compression wh-intonation are higher than those in indirect questions, and peak F0s in matrix verbs are lower than those in sentences with embedded scope. Secondly, a substantial F0 drop is found at the end of embedded sentences in indirect questions, whereas no F0 reduction at the same point is noticed in sentences with a matrix scope produced with a high plateau wh-intonation. However, these characteristics were not found in our experiment. This showed that a more compelling difference exists in the values obtained from subtraction between the peak F0s of each word (or a word plus an ending or case marker) and the F0s at the end of the word. Specifically, the gap between the peak F0 in a word composed with an embedded verb and the F0 at the end of the word, which is a complementizer in Korean, is large in embedded wh-scope sentences and low in matrix wh-scope sentences.
嵌入句子中的wh短语可以具有嵌入或矩阵作用域。用矩阵作用域解释“wh”短语在语法上往往是不可接受的,除非句子读起来是“wh”语调。前人的研究发现,矩阵范围句和嵌入范围句在韵律特征上存在两种差异。首先,使用F0压缩的wh-语调产生的wh-短语的F0峰值高于间接疑问句,而矩阵动词的F0峰值低于嵌入范围句。其次,间接疑问句中嵌入句末尾的F0显著下降,而在高平台h语调产生的矩阵范围的句子中,同一点的F0没有下降。然而,在我们的实验中没有发现这些特征。这表明,在每个单词(或单词加上结束或大小写标记)的峰值f0与单词末尾的f0之间通过减法获得的值存在更引人注目的差异。具体而言,由嵌入动词组成的词的F0峰值与韩国语补语词尾的F0峰值之间的差距在嵌入的wh范围句中较大,而在矩阵wh范围句中较小。
{"title":"A prosodic cue representing scopes of wh-phrases in Korean:\u0000 Focusing on North Gyeongsang Korean*","authors":"Weonhee Yun, Ki-tae Kim, Sunwoo Park","doi":"10.13064/ksss.2020.12.3.041","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.041","url":null,"abstract":"A wh-phrase in an embedded sentence may have either an embedded or a matrix scope. Interpretation of a wh-phrase with a matrix scope has tended to be syntactically unacceptable unless the sentence reads with a wh-intonation. Previous studies have found two differences in prosodic characteristics between sentences with matrix and embedded scopes. Firstly, peak F0s in wh-phrases produced with an F0 compression wh-intonation are higher than those in indirect questions, and peak F0s in matrix verbs are lower than those in sentences with embedded scope. Secondly, a substantial F0 drop is found at the end of embedded sentences in indirect questions, whereas no F0 reduction at the same point is noticed in sentences with a matrix scope produced with a high plateau wh-intonation. However, these characteristics were not found in our experiment. This showed that a more compelling difference exists in the values obtained from subtraction between the peak F0s of each word (or a word plus an ending or case marker) and the F0s at the end of the word. Specifically, the gap between the peak F0 in a word composed with an embedded verb and the F0 at the end of the word, which is a complementizer in Korean, is large in embedded wh-scope sentences and low in matrix wh-scope sentences.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128817177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Comparison of vowel lengths of articles and monosyllabic nouns in Korean EFL learners’ noun phrase production in relation to their English proficiency 冠词和单音节名词元音长度与韩国英语学习者名词短语生成的关系
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.033
Wooji Park, Ran Mo, S. Rhee
The purpose of this research was to find out the relation between Korean learners’ English proficiency and the ratio of the length of the stressed vowel in a monosyllabic noun to that of the unstressed vowel in an article of the noun phrases (e.g., “a cup”, “the bus”, etcs.). Generally, the vowels in monosyllabic content words are phonetically more prominent than the ones in monosyllabic function words as the former have phrasal stress, making the vowels in content words longer in length, higher in pitch, and louder in amplitude. This study, based on the speech samples from Korean-Spoken English Corpus (K-SEC) and Rated Korean-Spoken English Corpus (Rated K-SEC), examined 879 English noun phrases, which are composed of an article and a monosyllabic noun, from sentences which are rated on 4 levels of proficiency. The lengths of the vowels in these 879 target NPs were measured and the ratio of the vowel lengths in nouns to those in articles was calculated. It turned out that the higher the proficiency level, the greater the mean ratio of the vowels in nouns to the vowels in articles, confirming the research’s hypothesis. This research thus concluded that for the Korean English learners, the higher the English proficiency level, the better they could produce the stressed and unstressed vowels with more conspicuous length differences between them.
本研究旨在探讨韩国学习者的英语熟练程度与名词短语(如“a cup”、“The bus”等)中单音节名词重读元音与冠词非重读元音的长度之比之间的关系。一般来说,单音节实词中的元音在语音上比单音节虚词中的元音更突出,因为单音节虚词具有短语重音,使得实词中的元音长度更长,音高更高,振幅更大。本研究以韩国英语口语语料库(K-SEC)和评级韩国英语口语语料库(评级K-SEC)的语音样本为基础,从4级熟练程度的句子中检查了879个由冠词和单音节名词组成的英语名词短语。测量了879个目标NPs的元音长度,并计算了名词元音长度与冠词元音长度的比值。结果表明,熟练程度越高,名词中元音与冠词中元音的平均比例越大,证实了研究的假设。本研究结果表明,韩国英语学习者的英语水平越高,其重读元音和非重读元音的发音效果越好,元音和非重读元音的长度差异也越明显。
{"title":"Comparison of vowel lengths of articles and monosyllabic nouns in\u0000 Korean EFL learners’ noun phrase production in relation to their English\u0000 proficiency","authors":"Wooji Park, Ran Mo, S. Rhee","doi":"10.13064/ksss.2020.12.3.033","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.033","url":null,"abstract":"The purpose of this research was to find out the relation between Korean learners’ English proficiency and the ratio of the length of the stressed vowel in a monosyllabic noun to that of the unstressed vowel in an article of the noun phrases (e.g., “a cup”, “the bus”, etcs.). Generally, the vowels in monosyllabic content words are phonetically more prominent than the ones in monosyllabic function words as the former have phrasal stress, making the vowels in content words longer in length, higher in pitch, and louder in amplitude. This study, based on the speech samples from Korean-Spoken English Corpus (K-SEC) and Rated Korean-Spoken English Corpus (Rated K-SEC), examined 879 English noun phrases, which are composed of an article and a monosyllabic noun, from sentences which are rated on 4 levels of proficiency. The lengths of the vowels in these 879 target NPs were measured and the ratio of the vowel lengths in nouns to those in articles was calculated. It turned out that the higher the proficiency level, the greater the mean ratio of the vowels in nouns to the vowels in articles, confirming the research’s hypothesis. This research thus concluded that for the Korean English learners, the higher the English proficiency level, the better they could produce the stressed and unstressed vowels with more conspicuous length differences between them.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124337512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances 用情感话语与中性话语的韵律距离分析英语情感话语
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.025
S. Yi
An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm
通过测量672个情绪型英语话语与48个中性话语之间的韵律距离,对平静、快乐、悲伤、愤怒、恐惧、厌恶、惊讶7种情绪的英语情感话语进行了分析。将英语语音自动评价模型中提出的技术应用到本次情绪话语研究中,对从情绪和中性话语中提取的F0、强度、持续时间等3个韵律要素进行欧几里得距离测量。在此基础上,将分析方法扩展到欧氏距离归一化、z-score和z-score归一化,得到4组测量方案(sqrF0、sqrINT、sqrDUR;norsqr0, norsqrINT, norsqrDUR;sqrzF0, sqrzINT, sqrzDUR;norsqrzF0, norsqrzINT, norsqrzDUR)。情绪言语的知觉分析和声学分析结果一致表明,在4组测量方案中,norsqrF0、norsqrINT和norsqrDUR的有效性更高,使欧几里得测量规范化。受情绪影响的韵律信息的声学变化最大的是F0值,其次是持续时间和强度,根据对情绪话语与中性话语之间距离的估计,根据效应大小依次递减。Tukey事后检验揭示了4个同质子集(冷静<厌恶,悲伤<快乐,惊讶<惊讶,愤怒,恐惧)从norsqrF0的测量中统计确定,3个同质子集(惊讶,快乐,恐惧,悲伤,冷静<冷静,愤怒<愤怒,厌恶)从norsqrDUR。此外,对这7种情绪的分析表明,目前的研究结果与之前的研究结果是一致的。
{"title":"An analysis of emotional English utterances using the prosodic\u0000 distance between emotional and neutral utterances","authors":"S. Yi","doi":"10.13064/ksss.2020.12.3.025","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.025","url":null,"abstract":"An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm<disgust, sad<happy, surprised<surprised, angry, fearful) statistically determined from the measurement of norsqrF0 and 3 homogeneous subsets (surprised, happy, fearful, sad, calm<calm, angry<angry, disgust) from norsqrDUR. Furthermore, the analysis of each of the 7 emotions showed that the present research outcome is in the same vein as the results of the previous study.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133712880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A comparison of Korean vowel formants in conditions of chanting and reading utterances* 韩语元音共振体在诵经和朗读条件下的比较*
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.085
Jihye Park, Cheol-jae Seong
Vowel articulation in subjects related to speech disorders seems to be difficult. A chant method that properly reflects the characteristics of language could be used as an effective way of addressing the difficulties. The purpose of this study was to find out whether the chant method is effective as a means of enhancing vowel articulation. The subjects of this study were 60 normal adults (30 males and 30 females) in their 20s and 30s whose native language is Korean. Eight utterance conditions including chanting and reading conditions were recorded and their acoustic data were analyzed. The results of the analysis of the acoustic variables related to the formant confirmed that the F1 and F2 values ​​of the vowel formants are increased and the direction of movement of the center of gravity of the vowel triangle is statistically significantly forwarded and lowered in the chant method in both the word and the phrase context. The results also proved that accent is the most influential musical factor in chant. There was no significant difference between four repeated tokens, which increased the reliability of the results. In other words, chanting is an effective way to shift the center of gravity of the vowel triangle, which suggests that it can help to improve speech intelligibility by forming a desirable place for articulation.
在与语言障碍相关的科目中,元音发音似乎很困难。一种恰当地反映语言特点的吟诵方法可以作为解决困难的有效途径。本研究的目的是为了了解吟诵法是否有效地作为一种提高元音发音的手段。本次研究的对象是母语为韩国语的20、30多岁的正常成年人60名(男女各30名)。记录了诵经和诵读等8种话语状态,并对其声学数据进行了分析。与共振峰相关的声学变量分析结果证实,在单词和短语上下文中,吟诗法中元音共振峰的F1和F2值增加,元音三角重心的运动方向有统计学上的显著前移和降低。结果还表明,口音是影响吟诵的最重要的音乐因素。四个重复标记之间没有显著差异,这增加了结果的可靠性。换句话说,吟诵是一种有效的方式来转移元音三角形的重心,这表明它可以通过形成一个理想的发音位置来帮助提高语音的可理解性。
{"title":"A comparison of Korean vowel formants in conditions of chanting and reading utterances*","authors":"Jihye Park, Cheol-jae Seong","doi":"10.13064/ksss.2020.12.3.085","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.085","url":null,"abstract":"Vowel articulation in subjects related to speech disorders seems to be difficult. A chant method that properly reflects the characteristics of language could be used as an effective way of addressing the difficulties. The purpose of this study was to find out whether the chant method is effective as a means of enhancing vowel articulation. The subjects of this study were 60 normal adults (30 males and 30 females) in their 20s and 30s whose native language is Korean. Eight utterance conditions including chanting and reading conditions were recorded and their acoustic data were analyzed. The results of the analysis of the acoustic variables related to the formant confirmed that the F1 and F2 values ​​of the vowel formants are increased and the direction of movement of the center of gravity of the vowel triangle is statistically significantly forwarded and lowered in the chant method in both the word and the phrase context. The results also proved that accent is the most influential musical factor in chant. There was no significant difference between four repeated tokens, which increased the reliability of the results. In other words, chanting is an effective way to shift the center of gravity of the vowel triangle, which suggests that it can help to improve speech intelligibility by forming a desirable place for articulation.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131143784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparisons of voice quality parameter values measured with MDVP, Praat, and TF32 MDVP、Praat和TF32测量语音质量参数值的比较
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.073
Hyeju Ko, M. Woo, Yaelin Choi
Measured values may differ between Multi-Dimensional Voice Program (MDVP), Praat, and Time-Frequency Analysis software (TF32), all of which are widely used in voice quality analysis, due to differences in the algorithms used in each analyzer. Therefore, this study aimed to compare the values of parameters of normal voice measured with each analyzer. After tokens of the vowel sound /a/ were collected from 35 normal adult subjects (19 male and 16 female), they were analyzed with MDVP, Praat, and TF32. The mean values obtained from Praat for jitter variables (J local, J abs, J rap, and J ppq), shimmer variables (S local, S dB, and S apq), and noise-to-harmonics ratio (NHR) were significantly lower than those from MDVP in both males and females (p<.01). The mean values of J local, J abs, and S local were significantly lower in the order MDVP, Praat, and TF32 in both genders. In conclusion, the measured values differed across voice analyzers due to the differences in the algorithms each analyzer uses. Therefore, it is important for clinicians to analyze pathologic voice after understanding the normal criteria used by each analyzer when they use a voice analyzer in clinical practice.
由于各分析仪使用的算法不同,在语音质量分析中广泛使用的MDVP (Multi-Dimensional Voice Program)、Praat和时频分析软件(Time-Frequency Analysis software, TF32)的测量值可能会有所不同。因此,本研究旨在比较各分析仪测量的正常语音参数值。采集35名正常成人(男19名,女16名)的元音标记音/a/,用MDVP、Praat和TF32进行分析。在男性和女性中,Praat获得的抖动变量(J局部、J abs、J rap和J ppq)、闪烁变量(S局部、S dB和S apq)和噪声与谐波比(NHR)的平均值均显著低于MDVP (p< 0.01)。在MDVP、Praat和TF32的顺序上,J local、J abs和S local的平均值在两性中均显著较低。总之,由于每个语音分析仪使用的算法不同,不同语音分析仪的测量值不同。因此,临床医生在临床实践中使用语音分析仪时,在了解每种分析仪使用的正常标准后,对病理语音进行分析是很重要的。
{"title":"Comparisons of voice quality parameter values measured with MDVP,\u0000 Praat, and TF32","authors":"Hyeju Ko, M. Woo, Yaelin Choi","doi":"10.13064/ksss.2020.12.3.073","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.073","url":null,"abstract":"Measured values may differ between Multi-Dimensional Voice Program (MDVP), Praat, and Time-Frequency Analysis software (TF32), all of which are widely used in voice quality analysis, due to differences in the algorithms used in each analyzer. Therefore, this study aimed to compare the values of parameters of normal voice measured with each analyzer. After tokens of the vowel sound /a/ were collected from 35 normal adult subjects (19 male and 16 female), they were analyzed with MDVP, Praat, and TF32. The mean values obtained from Praat for jitter variables (J local, J abs, J rap, and J ppq), shimmer variables (S local, S dB, and S apq), and noise-to-harmonics ratio (NHR) were significantly lower than those from MDVP in both males and females (p<.01). The mean values of J local, J abs, and S local were significantly lower in the order MDVP, Praat, and TF32 in both genders. In conclusion, the measured values differed across voice analyzers due to the differences in the algorithms each analyzer uses. Therefore, it is important for clinicians to analyze pathologic voice after understanding the normal criteria used by each analyzer when they use a voice analyzer in clinical practice.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"714 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127259382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Voice-to-voice conversion using transformer network* 语音到语音转换使用变压器网络*
Pub Date : 2020-09-01 DOI: 10.13064/ksss.2020.12.3.055
June-Woo Kim, H. Jung
Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.
声音转换可以应用于各种处理应用程序。它还可以在语音识别的数据增强中发挥重要作用。传统方法采用语音转换与语音合成的结构,以Mel滤波器组为主要参数。Mel滤波器组非常适合神经网络的快速计算,但如果没有声码器的帮助,就无法转换成高质量的波形。此外,它在获取语音识别数据方面效果不佳。在本文中,我们专注于仅使用原始频谱进行语音到语音转换。我们提出一种基于变压器网络深度学习模型,很快学习语音转换的属性使用一个源和目标光谱组件之间的注意机制。实验是在TIDIGITS数据上进行的,TIDIGITS数据是说英语的人所说的一系列数字。使用从30名参与者中获得的平均意见得分(MOS)来评估转换声音的自然性和相似性。我们的最终结果为自然度3.52±0.22,相似度3.89±0.19。
{"title":"Voice-to-voice conversion using transformer network*","authors":"June-Woo Kim, H. Jung","doi":"10.13064/ksss.2020.12.3.055","DOIUrl":"https://doi.org/10.13064/ksss.2020.12.3.055","url":null,"abstract":"Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.","PeriodicalId":255285,"journal":{"name":"Phonetics and Speech Sciences","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128140300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Phonetics and Speech Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1