2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)最新文献
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357875
Ping Tang, Lei Liu, Shanpeng Li, Wentao Gu
This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening experiment. The subjects were asked to give their evaluations of the stimuli based on a five-point scale (obviously praiseful, somewhat praiseful, neutral, somewhat critical and obviously critical). The subjects' evaluations were recorded and analyzed. The results of the listening experiment shown that (a) Chinese subjects performed best among all subjects in the listening experiment, and L2 learners performed better than naíve subjects without any Chinese ability; also, (b) French subjects tended to under-evaluated the attitude of praising compared to Chinese and Japanese subjects. The correlations between the subjects' evaluations and certain acoustic parameters which were demonstrated to be important in regard to conveying affects were examined, by which the subjects' discrepancies in perceiving these attitudes were further discussed.
{"title":"Cross-linguistic perception of Chinese attitudes praising and blaming","authors":"Ping Tang, Lei Liu, Shanpeng Li, Wentao Gu","doi":"10.1109/ICSDA.2015.7357875","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357875","url":null,"abstract":"This study compared the perceptions of Chinese sentences conveying the attitudinal contrast of praising and blaming by five groups of subjects (Chinese natives, Japanese L2 learners of Mandarin, French L2 learners of Mandarin, Japanese and French subjects without any Mandarin ability). Context-elicited target sentences conveying praising, blaming or neutral attitude were used as stimuli in the listening experiment. The subjects were asked to give their evaluations of the stimuli based on a five-point scale (obviously praiseful, somewhat praiseful, neutral, somewhat critical and obviously critical). The subjects' evaluations were recorded and analyzed. The results of the listening experiment shown that (a) Chinese subjects performed best among all subjects in the listening experiment, and L2 learners performed better than naíve subjects without any Chinese ability; also, (b) French subjects tended to under-evaluated the attitude of praising compared to Chinese and Japanese subjects. The correlations between the subjects' evaluations and certain acoustic parameters which were demonstrated to be important in regard to conveying affects were examined, by which the subjects' discrepancies in perceiving these attitudes were further discussed.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131446402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357867
Jing Shu, Yirong Luo, Yang Yang, Jing Li, Difang Zhou
The study intends to explore EFL learners' Rhythm of English through a phonetic experiment over the reading-aloud task produced by the English learners from Guangxi Zhuang Autonomous Region. The tested subjects are grouped as the first-year non-major learners(NM1), the second-year non-major learners(NM2), the first-year major learners(M1) and the second-year major learners(M2). On the basis of the segmental analyses, four acoustic rhythmic indices are calculated: vowel quantity (V%) and consonantal variance (Δ C) proposed by Ramus et al (1999), vocalic normalized Pairwise Variability Index (nPVI) and intervocalic raw Pairwise Variability Index (rPVI) proposed by Grabe and Low (2002). The findings show that the English-major learners have a greater improvement than the non-major learners in the rhythmic production, suggesting that systematic instruction over the explicit phonetic and phonological knowledge together with comprehensive L2 input and sufficient L2 output helps to bridge the rhythmic marked difference between the learners' L1 and L2.
{"title":"English Rhythm of Guangxi Zhuang EFL learners","authors":"Jing Shu, Yirong Luo, Yang Yang, Jing Li, Difang Zhou","doi":"10.1109/ICSDA.2015.7357867","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357867","url":null,"abstract":"The study intends to explore EFL learners' Rhythm of English through a phonetic experiment over the reading-aloud task produced by the English learners from Guangxi Zhuang Autonomous Region. The tested subjects are grouped as the first-year non-major learners(NM1), the second-year non-major learners(NM2), the first-year major learners(M1) and the second-year major learners(M2). On the basis of the segmental analyses, four acoustic rhythmic indices are calculated: vowel quantity (V%) and consonantal variance (Δ C) proposed by Ramus et al (1999), vocalic normalized Pairwise Variability Index (nPVI) and intervocalic raw Pairwise Variability Index (rPVI) proposed by Grabe and Low (2002). The findings show that the English-major learners have a greater improvement than the non-major learners in the rhythmic production, suggesting that systematic instruction over the explicit phonetic and phonological knowledge together with comprehensive L2 input and sufficient L2 output helps to bridge the rhythmic marked difference between the learners' L1 and L2.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"68 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114120451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357863
Duan Wenjun, Jia Yuan
It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these two dialects. The data represents that F0 is the closest acoustic correlate with focus in both dialects. Suppressing the pitch range in the post-focus region and leaving the pitch range of the pre-focus syllables unchanged exist in all the dialects; meanwhile, focus in Jinan dialect is also realized by expanding the pitch range of all the syllables on-focus, but in Taiyuan dialect the pitch range of all the syllables on-focus doesn't observed significant expanding. Furthermore, no obvious differences in syllable duration and intensity between the focused and unfocused condition are discovered in all these dialects.
{"title":"Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect","authors":"Duan Wenjun, Jia Yuan","doi":"10.1109/ICSDA.2015.7357863","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357863","url":null,"abstract":"It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these two dialects. The data represents that F0 is the closest acoustic correlate with focus in both dialects. Suppressing the pitch range in the post-focus region and leaving the pitch range of the pre-focus syllables unchanged exist in all the dialects; meanwhile, focus in Jinan dialect is also realized by expanding the pitch range of all the syllables on-focus, but in Taiyuan dialect the pitch range of all the syllables on-focus doesn't observed significant expanding. Furthermore, no obvious differences in syllable duration and intensity between the focused and unfocused condition are discovered in all these dialects.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114305850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357858
Xingfeng Li, M. Akagi
This paper proposes a newly revised three-layered model to improve emotion dimensions (valence, activation) estimation for bilingual scenario, using knowledge of commonalities and differences of human perception among multiple languages. Most of previous systems on speech emotion recognition only worked in each mono-language. However, to construct a generalized emotion recognition system which be able to detect emotions for multiple languages, acoustic features selection and feature normalization among languages remained a topic. In this study, correlated features with emotion dimensions are selected to construct proposed model. To imitate emotion perception across languages, a novel normalization method is addressed by extracting direction and distance from neutral to other emotion in emotion dimensional space. Results show that the proposed system yields mean absolute error reduction rate of 46% and 34% for Japanese and German language respectively over previous system. The proposed system attains estimation performance more comparable to human evaluation on bilingual case.
{"title":"Toward improving estimation accuracy of emotion dimensions in bilingual scenario based on three-layered model","authors":"Xingfeng Li, M. Akagi","doi":"10.1109/ICSDA.2015.7357858","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357858","url":null,"abstract":"This paper proposes a newly revised three-layered model to improve emotion dimensions (valence, activation) estimation for bilingual scenario, using knowledge of commonalities and differences of human perception among multiple languages. Most of previous systems on speech emotion recognition only worked in each mono-language. However, to construct a generalized emotion recognition system which be able to detect emotions for multiple languages, acoustic features selection and feature normalization among languages remained a topic. In this study, correlated features with emotion dimensions are selected to construct proposed model. To imitate emotion perception across languages, a novel normalization method is addressed by extracting direction and distance from neutral to other emotion in emotion dimensional space. Results show that the proposed system yields mean absolute error reduction rate of 46% and 34% for Japanese and German language respectively over previous system. The proposed system attains estimation performance more comparable to human evaluation on bilingual case.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125162353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357859
Y. Iribe, N. Kitaoka, Shuhei Segawa
We have constructed a new speech data corpus, using the utterances of 100 elderly Japanese people, to improve speech recognition accuracy of the speech of older people. Humanoid robots are being developed for use in elder care nursing homes. Interaction with such robots is expected to help maintain the cognitive abilities of nursing home residents, as well as providing them with companionship. In order for these robots to interact with elderly people through spoken dialogue, a high performance speech recognition system for speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese, most of them are living in nursing homes, with an average age of 77.2. Previously, a seniors' speech corpus named S-JNAS was developed, but the average age of the participants was 67.6 years, but the target age for nursing home care is around 75 years old, much higher than that of the SJNAS samples. In this paper we compare our new corpus with an existing Japanese read speech corpus, JNAS, which consists of adult speech, and with the above mentioned S-JNAS, the senior version of JNAS.
{"title":"Development of new speech corpus for elderly Japanese speech recognition","authors":"Y. Iribe, N. Kitaoka, Shuhei Segawa","doi":"10.1109/ICSDA.2015.7357859","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357859","url":null,"abstract":"We have constructed a new speech data corpus, using the utterances of 100 elderly Japanese people, to improve speech recognition accuracy of the speech of older people. Humanoid robots are being developed for use in elder care nursing homes. Interaction with such robots is expected to help maintain the cognitive abilities of nursing home residents, as well as providing them with companionship. In order for these robots to interact with elderly people through spoken dialogue, a high performance speech recognition system for speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese, most of them are living in nursing homes, with an average age of 77.2. Previously, a seniors' speech corpus named S-JNAS was developed, but the average age of the participants was 67.6 years, but the target age for nursing home care is around 75 years old, much higher than that of the SJNAS samples. In this paper we compare our new corpus with an existing Japanese read speech corpus, JNAS, which consists of adult speech, and with the above mentioned S-JNAS, the senior version of JNAS.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-17DOI: 10.1109/ICSDA.2015.7357894
Hongwei Ding, D. Hirst, R. Hoffmann
OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages representing different prosodic typological characteristics. OMProDat contains recordings of 40 five-sentence passages read by 5 male and 5 female speakers of each language. Currently the database contains recordings for five languages: English, French, Mandarin Chinese, Cantonese and Korean. In this study, the generation of German database is introduced, and a prosodic comparison of these six languages is described. 18 melodic metrics were automatically extracted from each language, and can discriminate these six languages with an accuracy rate of 61.88%. Moreover, the dynamic melodic characteristics of the dialect in the German database can also be illustrated with these melodic metrics.
{"title":"Cross-linguistic prosodic comparison with OMProDat database","authors":"Hongwei Ding, D. Hirst, R. Hoffmann","doi":"10.1109/ICSDA.2015.7357894","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357894","url":null,"abstract":"OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages representing different prosodic typological characteristics. OMProDat contains recordings of 40 five-sentence passages read by 5 male and 5 female speakers of each language. Currently the database contains recordings for five languages: English, French, Mandarin Chinese, Cantonese and Korean. In this study, the generation of German database is introduced, and a prosodic comparison of these six languages is described. 18 melodic metrics were automatically extracted from each language, and can discriminate these six languages with an accuracy rate of 61.88%. Moreover, the dynamic melodic characteristics of the dialect in the German database can also be illustrated with these melodic metrics.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117053023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-14DOI: 10.1109/ICSDA.2015.7357889
Yanlong Zhang, Mee Sonu, H. Kato, Y. Sagisaka
For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness related parameters are used to reflect perceptual characteristics on duration. Correlation analyses have shown that these parameters can provide reasonable explanation for error characteristics obtained in the perceptual experiment on Japanese geminate/singleton consonants by Korean learners. A new possibility is suggested to design L2 teaching materials for Japanese geminate/singleton consonants with different phonetic contexts based on expected difficulties resulting from their loudness differences.
{"title":"Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters","authors":"Yanlong Zhang, Mee Sonu, H. Kato, Y. Sagisaka","doi":"10.1109/ICSDA.2015.7357889","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357889","url":null,"abstract":"For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness related parameters are used to reflect perceptual characteristics on duration. Correlation analyses have shown that these parameters can provide reasonable explanation for error characteristics obtained in the perceptual experiment on Japanese geminate/singleton consonants by Korean learners. A new possibility is suggested to design L2 teaching materials for Japanese geminate/singleton consonants with different phonetic contexts based on expected difficulties resulting from their loudness differences.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114361092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-12-14DOI: 10.1109/ICSDA.2015.7357872
Kakeru Yazawa, Yumi Ozaki, Greg Short, M. Kondo, Y. Sagisaka
This study investigated the production of English unstressed vowels by Japanese speakers based on the J-AESOP corpus. Four acoustic features associated with unstressed vowels in English were examined: duration, intensity, fundamental frequency and vowel quality. Comparative analysis with native English speakers' production revealed that the Japanese speakers achieved good control of all the acoustic features except vowel quality, supporting the results of previous studies. Their attainment of nearnative control of duration, intensity and fundamental frequency can be attributed to positive L1 transfer from Japanese. As for vowel quality, the present study showed that the quality of the Japanese speakers' unstressed vowels was more peripheral than that of the native English speakers' because of the difficulty of L2 phoneme acquisition and the influence of orthography.
{"title":"A study of the production of unstressed vowels by Japanese speakers of English using the J-AESOP corpus","authors":"Kakeru Yazawa, Yumi Ozaki, Greg Short, M. Kondo, Y. Sagisaka","doi":"10.1109/ICSDA.2015.7357872","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357872","url":null,"abstract":"This study investigated the production of English unstressed vowels by Japanese speakers based on the J-AESOP corpus. Four acoustic features associated with unstressed vowels in English were examined: duration, intensity, fundamental frequency and vowel quality. Comparative analysis with native English speakers' production revealed that the Japanese speakers achieved good control of all the acoustic features except vowel quality, supporting the results of previous studies. Their attainment of nearnative control of duration, intensity and fundamental frequency can be attributed to positive L1 transfer from Japanese. As for vowel quality, the present study showed that the quality of the Japanese speakers' unstressed vowels was more peripheral than that of the native English speakers' because of the difficulty of L2 phoneme acquisition and the influence of orthography.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127563473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-01DOI: 10.1109/ICSDA.2015.7357862
Chen-Yu Chiang
This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.
{"title":"A study on adaptation of speaking rate-dependent hierarchical prosodic model for Chinese dialect TTS","authors":"Chen-Yu Chiang","doi":"10.1109/ICSDA.2015.7357862","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357862","url":null,"abstract":"This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123148764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-10-01DOI: 10.1109/ICSDA.2015.7357879
T. Basu, Arup Saha, S. Chandra
The aim of the study is to experimentally verify the characteristics of Assamese consonants. All the previous work has been done subjectively. This study is the first attempt to objectively verify the characteristics of Assamese consonants. There are eight vowels, two semivowels and twenty one consonants in standard colloquial Assamese. Semivowels are excluded in this study. The study of place and manner of articulation is based on the Electropalatography system and acoustic data of the phoneme. Phonation process is determined by Electroglottograph. The study reveals that in Assamese there are contrasts in three distinct places of articulation: the lips, the alveolar ridge and the velum. It is also observed that alveolar plosives in context with vowel /u/ and /a/ show signature of retroflection.
{"title":"Objective verification of Assamese consonants","authors":"T. Basu, Arup Saha, S. Chandra","doi":"10.1109/ICSDA.2015.7357879","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357879","url":null,"abstract":"The aim of the study is to experimentally verify the characteristics of Assamese consonants. All the previous work has been done subjectively. This study is the first attempt to objectively verify the characteristics of Assamese consonants. There are eight vowels, two semivowels and twenty one consonants in standard colloquial Assamese. Semivowels are excluded in this study. The study of place and manner of articulation is based on the Electropalatography system and acoustic data of the phoneme. Phonation process is determined by Electroglottograph. The study reveals that in Assamese there are contrasts in three distinct places of articulation: the lips, the alveolar ridge and the velum. It is also observed that alveolar plosives in context with vowel /u/ and /a/ show signature of retroflection.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126195127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}