2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)最新文献

英文中文

Cross-linguistic perception of Chinese attitudes praising and blaming 中国人赞扬与指责态度的跨语言认知

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357875

Ping Tang, Lei Liu, Shanpeng Li, Wentao Gu

This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening experiment. The subjects were asked to give their evaluations of the stimuli based on a five-point scale (obviously praiseful, somewhat praiseful, neutral, somewhat critical and obviously critical). The subjects' evaluations were recorded and analyzed. The results of the listening experiment shown that (a) Chinese subjects performed best among all subjects in the listening experiment, and L2 learners performed better than naíve subjects without any Chinese ability; also, (b) French subjects tended to under-evaluated the attitude of praising compared to Chinese and Japanese subjects. The correlations between the subjects' evaluations and certain acoustic parameters which were demonstrated to be important in regard to conveying affects were examined, by which the subjects' discrepancies in perceiving these attitudes were further discussed.

本研究比较了五组被试(母语为中国、日语普通话二语学习者、法语普通话二语学习者、不具备普通话能力的日语和法语被试)对表达表扬和责备态度对比的汉语句子的感知。在听力实验中，语境诱导的目标句表达赞扬、指责或中立的态度。受试者被要求根据五分制(明显赞扬、有些赞扬、中性、有些批评和明显批评)对这些刺激做出评价。记录并分析受试者的评价。听力实验结果显示:(a)汉语受试者在听力实验中表现最好，L2学习者的表现优于naíve无汉语能力的受试者;此外，与中国和日本受试者相比，法国受试者倾向于低估表扬的态度。研究了被试的评价与某些声学参数之间的相关性，这些声学参数被证明是传达影响的重要因素，并通过这些声学参数进一步讨论了被试在感知这些态度方面的差异。

{"title":"Cross-linguistic perception of Chinese attitudes praising and blaming","authors":"Ping Tang, Lei Liu, Shanpeng Li, Wentao Gu","doi":"10.1109/ICSDA.2015.7357875","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357875","url":null,"abstract":"This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening experiment. The subjects were asked to give their evaluations of the stimuli based on a five-point scale (obviously praiseful, somewhat praiseful, neutral, somewhat critical and obviously critical). The subjects' evaluations were recorded and analyzed. The results of the listening experiment shown that (a) Chinese subjects performed best among all subjects in the listening experiment, and L2 learners performed better than naíve subjects without any Chinese ability; also, (b) French subjects tended to under-evaluated the attitude of praising compared to Chinese and Japanese subjects. The correlations between the subjects' evaluations and certain acoustic parameters which were demonstrated to be important in regard to conveying affects were examined, by which the subjects' discrepancies in perceiving these attitudes were further discussed.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

English Rhythm of Guangxi Zhuang EFL learners 广西壮族英语学习者的英语节奏

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357867

Jing Shu, Yirong Luo, Yang Yang, Jing Li, Difang Zhou

The study intends to explore EFL learners' Rhythm of English through a phonetic experiment over the reading-aloud task produced by the English learners from Guangxi Zhuang Autonomous Region. The tested subjects are grouped as the first-year non-major learners(NM1), the second-year non-major learners(NM2), the first-year major learners(M1) and the second-year major learners(M2). On the basis of the segmental analyses, four acoustic rhythmic indices are calculated: vowel quantity (V%) and consonantal variance (Δ C) proposed by Ramus et al (1999), vocalic normalized Pairwise Variability Index (nPVI) and intervocalic raw Pairwise Variability Index (rPVI) proposed by Grabe and Low (2002). The findings show that the English-major learners have a greater improvement than the non-major learners in the rhythmic production, suggesting that systematic instruction over the explicit phonetic and phonological knowledge together with comprehensive L2 input and sufficient L2 output helps to bridge the rhythmic marked difference between the learners' L1 and L2.

本研究旨在通过对广西壮族自治区英语学习者朗读任务的语音实验来探讨英语学习者的英语节奏。测试对象分为第一年非专业学习者(NM1)、第二年非专业学习者(NM2)、第一年专业学习者(M1)和第二年专业学习者(M2)。在分段分析的基础上，计算了四个声学节奏指标:由Ramus等人(1999)提出的元音数量(V%)和辅音方差(Δ C)，由Grabe和Low(2002)提出的声乐标准化成对变异性指数(nPVI)和间音原始成对变异性指数(rPVI)。研究结果表明，英语专业的学习者在节奏产生方面比非英语专业的学习者有更大的提高，这表明系统的显性语音和语音知识教学，结合全面的L2输入和充足的L2输出，有助于弥合学习者的L1和L2在节奏上的显著差异。

{"title":"English Rhythm of Guangxi Zhuang EFL learners","authors":"Jing Shu, Yirong Luo, Yang Yang, Jing Li, Difang Zhou","doi":"10.1109/ICSDA.2015.7357867","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357867","url":null,"abstract":"The study intends to explore EFL learners' Rhythm of English through a phonetic experiment over the reading-aloud task produced by the English learners from Guangxi Zhuang Autonomous Region. The tested subjects are grouped as the first-year non-major learners(NM1), the second-year non-major learners(NM2), the first-year major learners(M1) and the second-year major learners(M2). On the basis of the segmental analyses, four acoustic rhythmic indices are calculated: vowel quantity (V%) and consonantal variance (Δ C) proposed by Ramus et al (1999), vocalic normalized Pairwise Variability Index (nPVI) and intervocalic raw Pairwise Variability Index (rPVI) proposed by Grabe and Low (2002). The findings show that the English-major learners have a greater improvement than the non-major learners in the rhythmic production, suggesting that systematic instruction over the explicit phonetic and phonological knowledge together with comprehensive L2 input and sufficient L2 output helps to bridge the rhythmic marked difference between the learners' L1 and L2.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"68 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114120451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect 济南方言与太原方言焦点语音实现对比研究

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357863

Duan Wenjun, Jia Yuan

It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these two dialects. The data represents that F0 is the closest acoustic correlate with focus in both dialects. Suppressing the pitch range in the post-focus region and leaving the pitch range of the pre-focus syllables unchanged exist in all the dialects; meanwhile, focus in Jinan dialect is also realized by expanding the pitch range of all the syllables on-focus, but in Taiyuan dialect the pitch range of all the syllables on-focus doesn't observed significant expanding. Furthermore, no obvious differences in syllable duration and intensity between the focused and unfocused condition are discovered in all these dialects.

通常认为焦点在语篇中具有交际功能，每种语言都有自己的实现焦点的方式。本文比较了济南方言和太原方言的焦点实现。通过考察两种方言在聚焦和非聚焦条件下的平均F0、持续时间和强度的变化，探讨两种方言在聚焦实现上的异同。数据表明，F0是两种方言中与焦点最接近的声学关联。压制后焦点区域的音高范围和保持前焦点音节音高范围不变在所有方言中都存在;同时，济南方言的对焦也是通过扩大所有对焦音节的音高范围来实现的，而太原方言的对焦音节的音高范围没有明显的扩大。此外，所有方言在音节时长和音节强度上都没有发现聚焦和非聚焦状态的明显差异。

引用次数: 9

Toward improving estimation accuracy of emotion dimensions in bilingual scenario based on three-layered model 基于三层模型的双语情景情感维度估计精度研究

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357858

Xingfeng Li, M. Akagi

This paper proposes a newly revised three-layered model to improve emotion dimensions (valence, activation) estimation for bilingual scenario, using knowledge of commonalities and differences of human perception among multiple languages. Most of previous systems on speech emotion recognition only worked in each mono-language. However, to construct a generalized emotion recognition system which be able to detect emotions for multiple languages, acoustic features selection and feature normalization among languages remained a topic. In this study, correlated features with emotion dimensions are selected to construct proposed model. To imitate emotion perception across languages, a novel normalization method is addressed by extracting direction and distance from neutral to other emotion in emotion dimensional space. Results show that the proposed system yields mean absolute error reduction rate of 46% and 34% for Japanese and German language respectively over previous system. The proposed system attains estimation performance more comparable to human evaluation on bilingual case.

本文提出了一种新的三层模型，利用多语言感知的共性和差异，改进了对双语情景情感维度(效价、激活)的估计。以往的语音情感识别系统大多只适用于单一语言。然而，为了构建一个能够检测多语言情感的广义情感识别系统，语言间的声学特征选择和特征归一化一直是一个研究课题。本研究选取与情绪维度相关的特征来构建模型。为了模拟跨语言的情感感知，提出了一种新的标准化方法，即在情感维度空间中提取中性情绪到其他情绪的方向和距离。结果表明，与之前的系统相比，该系统对日语和德语的平均绝对错误率分别为46%和34%。该系统在双语情况下的估计性能更接近人类评估。

{"title":"Toward improving estimation accuracy of emotion dimensions in bilingual scenario based on three-layered model","authors":"Xingfeng Li, M. Akagi","doi":"10.1109/ICSDA.2015.7357858","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357858","url":null,"abstract":"This paper proposes a newly revised three-layered model to improve emotion dimensions (valence, activation) estimation for bilingual scenario, using knowledge of commonalities and differences of human perception among multiple languages. Most of previous systems on speech emotion recognition only worked in each mono-language. However, to construct a generalized emotion recognition system which be able to detect emotions for multiple languages, acoustic features selection and feature normalization among languages remained a topic. In this study, correlated features with emotion dimensions are selected to construct proposed model. To imitate emotion perception across languages, a novel normalization method is addressed by extracting direction and distance from neutral to other emotion in emotion dimensional space. Results show that the proposed system yields mean absolute error reduction rate of 46% and 34% for Japanese and German language respectively over previous system. The proposed system attains estimation performance more comparable to human evaluation on bilingual case.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125162353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Development of new speech corpus for elderly Japanese speech recognition 老年人日语语音识别新语料库的开发

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357859

Y. Iribe, N. Kitaoka, Shuhei Segawa

We have constructed a new speech data corpus, using the utterances of 100 elderly Japanese people, to improve speech recognition accuracy of the speech of older people. Humanoid robots are being developed for use in elder care nursing homes. Interaction with such robots is expected to help maintain the cognitive abilities of nursing home residents, as well as providing them with companionship. In order for these robots to interact with elderly people through spoken dialogue, a high performance speech recognition system for speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese, most of them are living in nursing homes, with an average age of 77.2. Previously, a seniors' speech corpus named S-JNAS was developed, but the average age of the participants was 67.6 years, but the target age for nursing home care is around 75 years old, much higher than that of the SJNAS samples. In this paper we compare our new corpus with an existing Japanese read speech corpus, JNAS, which consists of adult speech, and with the above mentioned S-JNAS, the senior version of JNAS.

为了提高老年人语音识别的准确率，我们构建了一个新的语音数据语料库，使用了100位日本老年人的语音。人形机器人正被开发用于老年人护理之家。与此类机器人的互动有望帮助维持养老院居民的认知能力，并为他们提供陪伴。为了使这些机器人能够通过语音对话与老年人互动，需要一种高性能的老年人语音识别系统。为了开发这样一个系统，我们记录了100名日本老人的讲话，他们大多住在养老院，平均年龄为77.2岁。此前，开发了一个名为S-JNAS的老年人语音语料库，但参与者的平均年龄为67.6岁，但养老院护理的目标年龄在75岁左右，远远高于SJNAS样本。在本文中，我们将我们的新语料库与现有的日语阅读语音语料库JNAS(由成人语音组成)以及上述S-JNAS (JNAS的高级版本)进行了比较。

{"title":"Development of new speech corpus for elderly Japanese speech recognition","authors":"Y. Iribe, N. Kitaoka, Shuhei Segawa","doi":"10.1109/ICSDA.2015.7357859","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357859","url":null,"abstract":"We have constructed a new speech data corpus, using the utterances of 100 elderly Japanese people, to improve speech recognition accuracy of the speech of older people. Humanoid robots are being developed for use in elder care nursing homes. Interaction with such robots is expected to help maintain the cognitive abilities of nursing home residents, as well as providing them with companionship. In order for these robots to interact with elderly people through spoken dialogue, a high performance speech recognition system for speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese, most of them are living in nursing homes, with an average age of 77.2. Previously, a seniors' speech corpus named S-JNAS was developed, but the average age of the participants was 67.6 years, but the target age for nursing home care is around 75 years old, much higher than that of the SJNAS samples. In this paper we compare our new corpus with an existing Japanese read speech corpus, JNAS, which consists of adult speech, and with the above mentioned S-JNAS, the senior version of JNAS.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Cross-linguistic prosodic comparison with OMProDat database 基于OMProDat数据库的跨语言韵律比较

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-17 DOI: 10.1109/ICSDA.2015.7357894

Hongwei Ding, D. Hirst, R. Hoffmann

OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages representing different prosodic typological characteristics. OMProDat contains recordings of 40 five-sentence passages read by 5 male and 5 female speakers of each language. Currently the database contains recordings for five languages: English, French, Mandarin Chinese, Cantonese and Korean. In this study, the generation of German database is introduced, and a prosodic comparison of these six languages is described. 18 melodic metrics were automatically extracted from each language, and can discriminate these six languages with an accuracy rate of 61.88%. Moreover, the dynamic melodic characteristics of the dialect in the German database can also be illustrated with these melodic metrics.

OMProDat是一个开放的多语言韵律数据库，旨在收集、归档和分发代表不同韵律类型特征的不同语言的直接可比较数据的记录和注释。OMProDat包含由每种语言的5名男性和5名女性朗读的40段五句话的录音。目前该数据库包含五种语言的录音:英语、法语、普通话、广东话和韩语。本研究介绍了德语数据库的生成，并对这六种语言的韵律进行了比较。从每种语言中自动提取出18个旋律度量，并能识别出6种语言，准确率为61.88%。此外，德语数据库中方言的动态旋律特征也可以用这些旋律度量来说明。

引用次数: 2

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters 利用响度相关参数分析二语学习者日语辅音双元音和单音节感知错误

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-14 DOI: 10.1109/ICSDA.2015.7357889

Yanlong Zhang, Mee Sonu, H. Kato, Y. Sagisaka

For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness related parameters are used to reflect perceptual characteristics on duration. Correlation analyses have shown that these parameters can provide reasonable explanation for error characteristics obtained in the perceptual experiment on Japanese geminate/singleton consonants by Korean learners. A new possibility is suggested to design L2 teaching materials for Japanese geminate/singleton consonants with different phonetic contexts based on expected difficulties resulting from their loudness differences.

为了更好地理解第二语言(L2)学习者识别日语双音/单音辅音的困难，新引入了一个感知因素，以弥补仅使用声学持续时间差异的传统解释的不足。为了系统地解释快/慢语音中与双/单音节识别相关的语速严重错误，使用响度相关参数来反映持续时间的感知特征。相关分析表明，这些参数可以合理解释韩国学习者在日语双音/单音感知实验中获得的错误特征。本文提出了一种新的可能性，即根据不同语音语境下的日语双音/单音辅音因其响度差异而产生的预期困难来设计第二语言教材。

引用次数: 0

A study of the production of unstressed vowels by Japanese speakers of English using the J-AESOP corpus 用J-AESOP语料库研究日本人说英语的非重读元音的产生

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-12-14 DOI: 10.1109/ICSDA.2015.7357872

Kakeru Yazawa, Yumi Ozaki, Greg Short, M. Kondo, Y. Sagisaka

This study investigated the production of English unstressed vowels by Japanese speakers based on the J-AESOP corpus. Four acoustic features associated with unstressed vowels in English were examined: duration, intensity, fundamental frequency and vowel quality. Comparative analysis with native English speakers' production revealed that the Japanese speakers achieved good control of all the acoustic features except vowel quality, supporting the results of previous studies. Their attainment of nearnative control of duration, intensity and fundamental frequency can be attributed to positive L1 transfer from Japanese. As for vowel quality, the present study showed that the quality of the Japanese speakers' unstressed vowels was more peripheral than that of the native English speakers' because of the difficulty of L2 phoneme acquisition and the influence of orthography.

本研究基于J-AESOP语料库调查了日本人英语非重读元音的产生。研究了英语中与非重读元音相关的四个声学特征:持续时间、强度、基本频率和元音质量。与以英语为母语的人的发音进行对比分析发现，日语使用者对除元音音质外的所有声学特征的控制都很好，这支持了之前的研究结果。他们对持续时间、强度和基频的接近本地控制可以归因于日语的积极母语迁移。在元音质量方面，本研究表明，由于二语音素习得困难和正字法的影响，日语母语者的非重读元音质量比英语母语者更为次要。

引用次数: 1

A study on adaptation of speaking rate-dependent hierarchical prosodic model for Chinese dialect TTS 基于语速的汉语方言TTS分层韵律模型的自适应研究

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-10-01 DOI: 10.1109/ICSDA.2015.7357862

Chen-Yu Chiang

This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.

本文提出了一种基于语速的分层韵律模型(SR- hpm)，并将其应用于资源有限的闽南语的分层韵律控制语音系统中。主要问题是克服直接从语言上下文、韵律和sr的稀疏覆盖的台湾数据库中构建SR-HPM的困难。利用台语和普通话具有相同的语言特征，本文提出了一种自适应方法，利用已有的普通话语料库，从一个小的台湾语料库中构建台湾SR- hpm。该方法包括两个部分:自适应归一化函数(NFs)和自适应韵律标记和建模算法(PLM)。这两个部分都是基于MAP估计而制定的，现有的普通话SR-HPM作为信息先验。本文使用一个小语料库对台湾TTS进行韵律生成实验，该语料库包含4.5-6.8个音节/秒的快速语音。实验结果表明，生成的韵律在3.4-6.8个音节/秒的大范围内听起来非常自然。

{"title":"A study on adaptation of speaking rate-dependent hierarchical prosodic model for Chinese dialect TTS","authors":"Chen-Yu Chiang","doi":"10.1109/ICSDA.2015.7357862","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357862","url":null,"abstract":"This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123148764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Objective verification of Assamese consonants 阿萨姆语辅音的客观验证

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Pub Date : 2015-10-01 DOI: 10.1109/ICSDA.2015.7357879

T. Basu, Arup Saha, S. Chandra

The aim of the study is to experimentally verify the characteristics of Assamese consonants. All the previous work has been done subjectively. This study is the first attempt to objectively verify the characteristics of Assamese consonants. There are eight vowels, two semivowels and twenty one consonants in standard colloquial Assamese. Semivowels are excluded in this study. The study of place and manner of articulation is based on the Electropalatography system and acoustic data of the phoneme. Phonation process is determined by Electroglottograph. The study reveals that in Assamese there are contrasts in three distinct places of articulation: the lips, the alveolar ridge and the velum. It is also observed that alveolar plosives in context with vowel /u/ and /a/ show signature of retroflection.

本研究的目的是通过实验验证阿萨姆语辅音的特点。前面的工作都是主观的。本研究首次尝试客观验证阿萨姆语辅音的特点。阿萨姆语有8个元音，2个半元音和21个辅音。本研究排除了半元音。发音位置和发音方式的研究是基于电腭系统和音素的声学数据。发声过程由声门电描记仪测定。研究表明，在阿萨姆语中，有三个不同的发音部位存在差异:嘴唇、牙槽嵴和腭膜。我们还观察到，在带有/u/和/a/元音的上下文中，肺泡爆破音表现出反身的特征。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀