首页 > 最新文献

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)最新文献

英文 中文
Design of multi-channel indoor noise database for speech processing in noise 多通道室内噪声数据库设计,用于噪声环境下的语音处理
Kwang Myung Jeon, Nam Kyun Kim, Moon Ju Jo, H. Kim
An indoor noise database is essential for development and assessment of distant speech recognition systems operating in indoor environments. This paper proposes a multi-channel indoor noise database. Each noise signal in the proposed database was recorded using a four-channel linear microphone array located in one corner of a living room in a condominium. Noise sources were generated either by physical actions or loudspeakers at various positions inside the condominium, including five different TV contents and 28 indoor noise sources categorized as repeated, stationary, or moving during the database recording. The indoor noise database was then verified by measuring a direction of arrival for each recorded noise source, which showed that the proposed database was suitable for developing and evaluating multi-channel speech processing algorithms in noisy indoor environments.
室内噪声数据库对于开发和评估在室内环境中运行的远程语音识别系统至关重要。本文提出了一种多通道室内噪声数据库。该数据库中的每个噪声信号都是使用位于公寓客厅一角的四通道线性麦克风阵列记录的。噪声源是由身体活动或公寓内不同位置的扬声器产生的,包括5种不同的电视内容和28种室内噪声源,这些噪声源在数据库记录期间被分类为重复、静止或移动。然后通过测量每个记录噪声源的到达方向对室内噪声数据库进行验证,表明所提出的数据库适合于在嘈杂的室内环境中开发和评估多通道语音处理算法。
{"title":"Design of multi-channel indoor noise database for speech processing in noise","authors":"Kwang Myung Jeon, Nam Kyun Kim, Moon Ju Jo, H. Kim","doi":"10.1109/ICSDA.2017.8384418","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384418","url":null,"abstract":"An indoor noise database is essential for development and assessment of distant speech recognition systems operating in indoor environments. This paper proposes a multi-channel indoor noise database. Each noise signal in the proposed database was recorded using a four-channel linear microphone array located in one corner of a living room in a condominium. Noise sources were generated either by physical actions or loudspeakers at various positions inside the condominium, including five different TV contents and 28 indoor noise sources categorized as repeated, stationary, or moving during the database recording. The indoor noise database was then verified by measuring a direction of arrival for each recorded noise source, which showed that the proposed database was suitable for developing and evaluating multi-channel speech processing algorithms in noisy indoor environments.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123113469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a Vietnamese speech recognition system for Viettel call center 为越南电信呼叫中心开发越南语语音识别系统
Quoc Bao Nguyen, Van Hai Do, Ba Quyen Dam, Minh Hung Le
In this paper, we first present our effort to collect a 85.8 hour corpus for Vietnamese telephone conversational speech from our Viettel call center. After that, various techniques such as time delay deep neural network (TDNN) with sequence training, data augmentation are applied to build the speech recognition system. Our final system achieves a low word error rate at 17.44% for this challenging corpus. To the best of our knowledge, it is the first attempt to build Vietnamese corpus and speech recognition system for the customer service domain.
在本文中,我们首先展示了从我们的Viettel呼叫中心收集85.8小时越南电话会话语音语料库的努力。在此基础上,采用时序训练、数据增强的时延深度神经网络(TDNN)等技术构建语音识别系统。我们的最终系统在这个具有挑战性的语料库中实现了17.44%的低单词错误率。据我们所知,这是第一次尝试为客户服务领域建立越南语语料库和语音识别系统。
{"title":"Development of a Vietnamese speech recognition system for Viettel call center","authors":"Quoc Bao Nguyen, Van Hai Do, Ba Quyen Dam, Minh Hung Le","doi":"10.1109/ICSDA.2017.8384456","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384456","url":null,"abstract":"In this paper, we first present our effort to collect a 85.8 hour corpus for Vietnamese telephone conversational speech from our Viettel call center. After that, various techniques such as time delay deep neural network (TDNN) with sequence training, data augmentation are applied to build the speech recognition system. Our final system achieves a low word error rate at 17.44% for this challenging corpus. To the best of our knowledge, it is the first attempt to build Vietnamese corpus and speech recognition system for the customer service domain.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129529029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Corpus-based evaluation of Chinese text normalization 基于语料库的中文文本规范化评价
Sunhee Kim
This paper aims to present a method of developing a corpus consisting of various categories of Non-Standard Words (NSWs) and a representative test set which will be used for the evaluation of the text normalization modules proposed for Standard Mandarin and Taiwanese Mandarin. A total of 191,431 sentences with NSWs are extracted for the Standard Mandarin and a total of 731,524 sentences with NSWs are extracted for Taiwanese Mandarin. In order to make a representative test set, 1,000 sentences for Standard Mandarin and Taiwanese Mandarin are randomly chosen from these sentences, maintaining the same proportion of the source corpus as well as the similar proportion of each category of NSWs.
本文旨在提出一种开发非标准词语料库的方法,该语料库由不同类别的非标准词组成,并提供一个具有代表性的测试集,用于评估标准普通话和台湾普通话的文本规范化模块。标准普通话共提取了191431个带有新音的句子,台湾普通话共提取了731524个带有新音的句子。为了制作一个有代表性的测试集,从这些句子中随机抽取标准普通话和台湾普通话各1000个句子,保持源语料库的比例相同,同时保持新语类各占比相近。
{"title":"Corpus-based evaluation of Chinese text normalization","authors":"Sunhee Kim","doi":"10.1109/ICSDA.2017.8384473","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384473","url":null,"abstract":"This paper aims to present a method of developing a corpus consisting of various categories of Non-Standard Words (NSWs) and a representative test set which will be used for the evaluation of the text normalization modules proposed for Standard Mandarin and Taiwanese Mandarin. A total of 191,431 sentences with NSWs are extracted for the Standard Mandarin and a total of 731,524 sentences with NSWs are extracted for Taiwanese Mandarin. In order to make a representative test set, 1,000 sentences for Standard Mandarin and Taiwanese Mandarin are randomly chosen from these sentences, maintaining the same proportion of the source corpus as well as the similar proportion of each category of NSWs.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115922475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Seoul Corpus, spontaneous speech in Seoul Korean 首尔语料库,首尔韩语自发演讲
Weonhee Yun
The Seoul Corpus is a spontaneous speech corpus in Seoul Korean fully segmented with several levels of annotations in the Praat Textgrid format. A total of 40 people who were balanced for age and sex participated in the recordings. Each had an interview about various topics for an hour, and the recordings were labeled first by forced alignment using the HTK and then were fine-tuned by human labelers. About 220,000 phrasal words were included and 1,135,263 phoneme tokens were labeled. The corpus has already been distributed to the research community free of charge.
首尔语料库是一个自发的首尔韩语语音语料库,在Praat Textgrid格式中完全分割了几个级别的注释。共有40名年龄和性别平衡的人参与了录音。每个人都接受了一个小时的不同主题的采访,录音首先通过使用HTK强制对齐进行标记,然后由人工标记员进行微调。共收录了约22万个短语单词,标记了1135263个音素符号。该语料库已经免费分发给研究界。
{"title":"The Seoul Corpus, spontaneous speech in Seoul Korean","authors":"Weonhee Yun","doi":"10.1109/ICSDA.2017.8384420","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384420","url":null,"abstract":"The Seoul Corpus is a spontaneous speech corpus in Seoul Korean fully segmented with several levels of annotations in the Praat Textgrid format. A total of 40 people who were balanced for age and sex participated in the recordings. Each had an interview about various topics for an hour, and the recordings were labeled first by forced alignment using the HTK and then were fine-tuned by human labelers. About 220,000 phrasal words were included and 1,135,263 phoneme tokens were labeled. The corpus has already been distributed to the research community free of charge.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132570756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The aspects of stop voicing in L1 and Korean-spoken L2 Englishes in regards to the place of articulation 英语一语和韩语二语在发音位置上的停音现象
JeeSok Lee, S. Rhee
The aim of this study is to compare the aspects, particularly the occurrence of vocal folds vibration during the stop closure, of the stop consonants [b], [d], and [g] produced by the Native Speakers of English and Korean EFL Speakers. It will be examined whether stop voicing in the onset and coda positions is influenced by the place of articulation. Based on K-SEC (Korean-Spoken English Corpus), i) Korean Speakers' productions of the isolated words which have the voiced stops [b], [d], and [g] as onsets, followed by six different vowels [i], [e], [s], [a], [o], and [u], and ii) the same voiced codas preceded by the aforementioned vowels are to be used for the analysis. Aspects of the initial and final stop voicing manifested by Native Speakers are also to be analyzed and then compared with those by the Korean learners of English.
本研究的目的是比较以英语为母语的人和以英语为母语的韩国人发出的辅音[b]、[d]和[g]的消音过程中声带振动的发生情况。它将检查停止发声在开始和结束的位置是否受到发音的地方的影响。基于K-SEC(韩国语口语语料库),i)韩国语者产生的以浊音停顿[b], [d], [g]为开头的孤立词,后面跟着六个不同的元音[i], [e], [s], [a], [o], [u], ii)前面有上述元音的相同浊音尾,用于分析。本文还分析了以英语为母语的人所表现出的起止音和末止音的各个方面,并与韩国的英语学习者进行了比较。
{"title":"The aspects of stop voicing in L1 and Korean-spoken L2 Englishes in regards to the place of articulation","authors":"JeeSok Lee, S. Rhee","doi":"10.1109/ICSDA.2017.8384462","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384462","url":null,"abstract":"The aim of this study is to compare the aspects, particularly the occurrence of vocal folds vibration during the stop closure, of the stop consonants [b], [d], and [g] produced by the Native Speakers of English and Korean EFL Speakers. It will be examined whether stop voicing in the onset and coda positions is influenced by the place of articulation. Based on K-SEC (Korean-Spoken English Corpus), i) Korean Speakers' productions of the isolated words which have the voiced stops [b], [d], and [g] as onsets, followed by six different vowels [i], [e], [s], [a], [o], and [u], and ii) the same voiced codas preceded by the aforementioned vowels are to be used for the analysis. Aspects of the initial and final stop voicing manifested by Native Speakers are also to be analyzed and then compared with those by the Korean learners of English.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130613553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the usages of conditional clauses in Japanese maptask dialogue 论日语地图任务对话中条件从句的用法
Yoshiko Kawabata, Toshihiko Matsuka, Yasuharu Den
The present study examined how four well-known particles of Japanese conditional clauses, namely TARA, TO, BA, and NARA, were actually used by analyzing Japanese Map Task Dialogue Corpus. We found clear differences in how they were used. In particular, different particles were used to refer to different contents of the main clauses. We argue that the differences are caused by difference in knowledge that speakers try to share with hearers, and we introduced discourse functions of the particles on the basis of the differences in knowledge that is tried to be shared.
本研究通过对日语地图任务对话语料库的分析,考察了日语条件从句中TARA、TO、BA和NARA四种常见小品的实际使用情况。我们发现它们在使用方式上存在明显差异。特别地,不同的助词被用来表示主要分句的不同内容。我们认为这些差异是由说话者试图与听者分享的知识的差异造成的,并在试图分享的知识差异的基础上引入了语词的话语功能。
{"title":"On the usages of conditional clauses in Japanese maptask dialogue","authors":"Yoshiko Kawabata, Toshihiko Matsuka, Yasuharu Den","doi":"10.1109/ICSDA.2017.8384454","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384454","url":null,"abstract":"The present study examined how four well-known particles of Japanese conditional clauses, namely TARA, TO, BA, and NARA, were actually used by analyzing Japanese Map Task Dialogue Corpus. We found clear differences in how they were used. In particular, different particles were used to refer to different contents of the main clauses. We argue that the differences are caused by difference in knowledge that speakers try to share with hearers, and we introduced discourse functions of the particles on the basis of the differences in knowledge that is tried to be shared.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122194873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A progress report of the Taiwan Mandarin radio speech corpus project 台湾普通话广播语料库项目进展报告
Y. Liao, Y. Chang, Sing-Yue Wang, Jhih-wei Chen, Sheng-Ming Wang, Jenq-Haur Wang
The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.
台湾普通话广播语音语料库包含300小时(并且还在不断增长)的高质量录音,这些录音是从台湾国家教育广播电台(NER)的档案中挑选出来的。该语料库包含语音(各种说话风格,由数百名演讲者产生)及其相应的转录(自动转录和手动校正)和注释,适合语音和语言研究。在本文中,我们报告了语料库开发的进展,特别是展示了在该语料库上音频事件检测/分割和半监督声学模型训练的实验结果。
{"title":"A progress report of the Taiwan Mandarin radio speech corpus project","authors":"Y. Liao, Y. Chang, Sing-Yue Wang, Jhih-wei Chen, Sheng-Ming Wang, Jenq-Haur Wang","doi":"10.1109/ICSDA.2017.8384450","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384450","url":null,"abstract":"The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116682718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Speech emotion recognition from Indonesian spoken language using acoustic and lexical features 基于声学和词汇特征的印尼语语音情感识别
Pipin Kurniawati, D. Lestari, M. L. Khodra
This paper describes our works to extend the previous work on emotion recognition for Indonesian spoken language. In this research, we construct an Indonesian emotional corpus (IDEC). In constructing the corpus, we aim at natural emotional occurrences from television talk shows. IDEC is utilized to construct the emotion recognizer using two main features, acoustic and lexical features. The Support Vector Machine (SVM), Random Forest (RF), and Multinomial Naive Bayes (MNB) algorithms are employed to model the emotions. Experiment result shows that SVM outperforms the RF and MNB algorithms. It achieves an average F- measure of 0.713 for 6 emotion classes by combining both acoustic and lexical features.
本文描述了我们在印尼语口语情感识别方面的工作。在本研究中,我们建构了一个印尼语情感语料库(IDEC)。在构建语料库时,我们以电视谈话节目中的自然情感事件为目标。利用IDEC技术,利用声学和词汇两个主要特征来构建情感识别器。采用支持向量机(SVM)、随机森林(RF)和多项朴素贝叶斯(MNB)算法对情绪进行建模。实验结果表明,SVM算法优于RF算法和MNB算法。结合声学特征和词汇特征,对6个情感类别的平均F-测量值为0.713。
{"title":"Speech emotion recognition from Indonesian spoken language using acoustic and lexical features","authors":"Pipin Kurniawati, D. Lestari, M. L. Khodra","doi":"10.1109/ICSDA.2017.8384467","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384467","url":null,"abstract":"This paper describes our works to extend the previous work on emotion recognition for Indonesian spoken language. In this research, we construct an Indonesian emotional corpus (IDEC). In constructing the corpus, we aim at natural emotional occurrences from television talk shows. IDEC is utilized to construct the emotion recognizer using two main features, acoustic and lexical features. The Support Vector Machine (SVM), Random Forest (RF), and Multinomial Naive Bayes (MNB) algorithms are employed to model the emotions. Experiment result shows that SVM outperforms the RF and MNB algorithms. It achieves an average F- measure of 0.713 for 6 emotion classes by combining both acoustic and lexical features.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Diphthongized vowels in the Xiuning Hui Chinese Dialect 休宁回族方言的双元音化
Minghui Zhang, Fang Hu
This paper gives an acoustic phonetic description of the diphthongized vowels in the Xiuning Hui Chinese dialect in terms of temporal structure, spectral property and dynamic aspect. The results suggest that diphthongized vowels in Xiuning function as an intermediate vowel category between monophthongs and diphthongs. And comparisons between the Xiuning case, Yi county Hui, and Qimen Hui reveal that the process of diphthongization is gradient in Hui dialects.
本文从时间结构、谱性和动态方面对休宁回族方言的双元音进行了语音描述。结果表明,休宁语的双元音是介于单元音和双元音之间的中间元音类别。通过对休宁回语、彝县回语和祁门回语的比较,可以发现回语方言双元音化的过程是渐变的。
{"title":"Diphthongized vowels in the Xiuning Hui Chinese Dialect","authors":"Minghui Zhang, Fang Hu","doi":"10.1109/ICSDA.2017.8384458","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384458","url":null,"abstract":"This paper gives an acoustic phonetic description of the diphthongized vowels in the Xiuning Hui Chinese dialect in terms of temporal structure, spectral property and dynamic aspect. The results suggest that diphthongized vowels in Xiuning function as an intermediate vowel category between monophthongs and diphthongs. And comparisons between the Xiuning case, Yi county Hui, and Qimen Hui reveal that the process of diphthongization is gradient in Hui dialects.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117257196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Chinese TIMIT: A TIMIT-like corpus of standard Chinese 汉语TIMIT:一个类似于TIMIT的标准汉语语料库
Jiahong Yuan, Hongwei Ding, Sishi Liao, Yuqing Zhan, M. Liberman
This paper describes an effort to build a TIMIT-like corpus in Standard Chinese, which is part of our "Global TIMIT" project. Three steps are involved and detailed in the paper: selection of sentences; speaker recruitment and recording; and phonetic segmentation. The corpus consists of 6000 sentences read by 50 speakers (25 females and 25 males). Phonetic segmentation obtained from forced alignment is provided, which has 93.2% agreement (of phone boundaries) within 20 ms compared to manual segmentation on 50 randomly selected sentences. Statistics on the number of tokens and mean duration of phones and tones in the corpus are also reported. Males have shorter phones/tones but more and longer utterance internal silences than females, demonstrating that males in this dataset speak faster but pause more frequently and longer.
本文描述了在标准汉语中建立一个类似TIMIT的语料库的努力,这是我们的“全球TIMIT”项目的一部分。本文主要涉及三个步骤:选句;演讲者招募及录音;还有语音分割。该语料库由50位演讲者(25位女性和25位男性)朗读的6000个句子组成。提供了通过强制对齐获得的语音切分,与随机选择50个句子的人工切分相比,在20 ms内(电话边界)的一致性为93.2%。对语料库中电话和音调的令牌数量和平均持续时间的统计也进行了报道。与女性相比,男性的电话/音调更短,但话语内部沉默的时间更长,这表明该数据集中的男性说话更快,但停顿的频率更高,时间更长。
{"title":"Chinese TIMIT: A TIMIT-like corpus of standard Chinese","authors":"Jiahong Yuan, Hongwei Ding, Sishi Liao, Yuqing Zhan, M. Liberman","doi":"10.1109/ICSDA.2017.8384463","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384463","url":null,"abstract":"This paper describes an effort to build a TIMIT-like corpus in Standard Chinese, which is part of our \"Global TIMIT\" project. Three steps are involved and detailed in the paper: selection of sentences; speaker recruitment and recording; and phonetic segmentation. The corpus consists of 6000 sentences read by 50 speakers (25 females and 25 males). Phonetic segmentation obtained from forced alignment is provided, which has 93.2% agreement (of phone boundaries) within 20 ms compared to manual segmentation on 50 randomly selected sentences. Statistics on the number of tokens and mean duration of phones and tones in the corpus are also reported. Males have shorter phones/tones but more and longer utterance internal silences than females, demonstrating that males in this dataset speak faster but pause more frequently and longer.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127549122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1