2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)最新文献

英文中文

Design of multi-channel indoor noise database for speech processing in noise 多通道室内噪声数据库设计，用于噪声环境下的语音处理

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384418

Kwang Myung Jeon, Nam Kyun Kim, Moon Ju Jo, H. Kim

An indoor noise database is essential for development and assessment of distant speech recognition systems operating in indoor environments. This paper proposes a multi-channel indoor noise database. Each noise signal in the proposed database was recorded using a four-channel linear microphone array located in one corner of a living room in a condominium. Noise sources were generated either by physical actions or loudspeakers at various positions inside the condominium, including five different TV contents and 28 indoor noise sources categorized as repeated, stationary, or moving during the database recording. The indoor noise database was then verified by measuring a direction of arrival for each recorded noise source, which showed that the proposed database was suitable for developing and evaluating multi-channel speech processing algorithms in noisy indoor environments.

室内噪声数据库对于开发和评估在室内环境中运行的远程语音识别系统至关重要。本文提出了一种多通道室内噪声数据库。该数据库中的每个噪声信号都是使用位于公寓客厅一角的四通道线性麦克风阵列记录的。噪声源是由身体活动或公寓内不同位置的扬声器产生的，包括5种不同的电视内容和28种室内噪声源，这些噪声源在数据库记录期间被分类为重复、静止或移动。然后通过测量每个记录噪声源的到达方向对室内噪声数据库进行验证，表明所提出的数据库适合于在嘈杂的室内环境中开发和评估多通道语音处理算法。

引用次数: 0

Development of a Vietnamese speech recognition system for Viettel call center 为越南电信呼叫中心开发越南语语音识别系统

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384456

Quoc Bao Nguyen, Van Hai Do, Ba Quyen Dam, Minh Hung Le

In this paper, we first present our effort to collect a 85.8 hour corpus for Vietnamese telephone conversational speech from our Viettel call center. After that, various techniques such as time delay deep neural network (TDNN) with sequence training, data augmentation are applied to build the speech recognition system. Our final system achieves a low word error rate at 17.44% for this challenging corpus. To the best of our knowledge, it is the first attempt to build Vietnamese corpus and speech recognition system for the customer service domain.

在本文中，我们首先展示了从我们的Viettel呼叫中心收集85.8小时越南电话会话语音语料库的努力。在此基础上，采用时序训练、数据增强的时延深度神经网络(TDNN)等技术构建语音识别系统。我们的最终系统在这个具有挑战性的语料库中实现了17.44%的低单词错误率。据我们所知，这是第一次尝试为客户服务领域建立越南语语料库和语音识别系统。

引用次数: 4

Corpus-based evaluation of Chinese text normalization 基于语料库的中文文本规范化评价

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384473

Sunhee Kim

This paper aims to present a method of developing a corpus consisting of various categories of Non-Standard Words (NSWs) and a representative test set which will be used for the evaluation of the text normalization modules proposed for Standard Mandarin and Taiwanese Mandarin. A total of 191,431 sentences with NSWs are extracted for the Standard Mandarin and a total of 731,524 sentences with NSWs are extracted for Taiwanese Mandarin. In order to make a representative test set, 1,000 sentences for Standard Mandarin and Taiwanese Mandarin are randomly chosen from these sentences, maintaining the same proportion of the source corpus as well as the similar proportion of each category of NSWs.

本文旨在提出一种开发非标准词语料库的方法，该语料库由不同类别的非标准词组成，并提供一个具有代表性的测试集，用于评估标准普通话和台湾普通话的文本规范化模块。标准普通话共提取了191431个带有新音的句子，台湾普通话共提取了731524个带有新音的句子。为了制作一个有代表性的测试集，从这些句子中随机抽取标准普通话和台湾普通话各1000个句子，保持源语料库的比例相同，同时保持新语类各占比相近。

引用次数: 1

The Seoul Corpus, spontaneous speech in Seoul Korean 首尔语料库，首尔韩语自发演讲

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384420

Weonhee Yun

The Seoul Corpus is a spontaneous speech corpus in Seoul Korean fully segmented with several levels of annotations in the Praat Textgrid format. A total of 40 people who were balanced for age and sex participated in the recordings. Each had an interview about various topics for an hour, and the recordings were labeled first by forced alignment using the HTK and then were fine-tuned by human labelers. About 220,000 phrasal words were included and 1,135,263 phoneme tokens were labeled. The corpus has already been distributed to the research community free of charge.

首尔语料库是一个自发的首尔韩语语音语料库，在Praat Textgrid格式中完全分割了几个级别的注释。共有40名年龄和性别平衡的人参与了录音。每个人都接受了一个小时的不同主题的采访，录音首先通过使用HTK强制对齐进行标记，然后由人工标记员进行微调。共收录了约22万个短语单词，标记了1135263个音素符号。该语料库已经免费分发给研究界。

引用次数: 0

The aspects of stop voicing in L1 and Korean-spoken L2 Englishes in regards to the place of articulation 英语一语和韩语二语在发音位置上的停音现象

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384462

JeeSok Lee, S. Rhee

The aim of this study is to compare the aspects, particularly the occurrence of vocal folds vibration during the stop closure, of the stop consonants [b], [d], and [g] produced by the Native Speakers of English and Korean EFL Speakers. It will be examined whether stop voicing in the onset and coda positions is influenced by the place of articulation. Based on K-SEC (Korean-Spoken English Corpus), i) Korean Speakers' productions of the isolated words which have the voiced stops [b], [d], and [g] as onsets, followed by six different vowels [i], [e], [s], [a], [o], and [u], and ii) the same voiced codas preceded by the aforementioned vowels are to be used for the analysis. Aspects of the initial and final stop voicing manifested by Native Speakers are also to be analyzed and then compared with those by the Korean learners of English.

本研究的目的是比较以英语为母语的人和以英语为母语的韩国人发出的辅音[b]、[d]和[g]的消音过程中声带振动的发生情况。它将检查停止发声在开始和结束的位置是否受到发音的地方的影响。基于K-SEC(韩国语口语语料库)，i)韩国语者产生的以浊音停顿[b]， [d]， [g]为开头的孤立词，后面跟着六个不同的元音[i]， [e]， [s]， [a]， [o]， [u]， ii)前面有上述元音的相同浊音尾，用于分析。本文还分析了以英语为母语的人所表现出的起止音和末止音的各个方面，并与韩国的英语学习者进行了比较。

引用次数: 0

On the usages of conditional clauses in Japanese maptask dialogue 论日语地图任务对话中条件从句的用法

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384454

Yoshiko Kawabata, Toshihiko Matsuka, Yasuharu Den

The present study examined how four well-known particles of Japanese conditional clauses, namely TARA, TO, BA, and NARA, were actually used by analyzing Japanese Map Task Dialogue Corpus. We found clear differences in how they were used. In particular, different particles were used to refer to different contents of the main clauses. We argue that the differences are caused by difference in knowledge that speakers try to share with hearers, and we introduced discourse functions of the particles on the basis of the differences in knowledge that is tried to be shared.

本研究通过对日语地图任务对话语料库的分析，考察了日语条件从句中TARA、TO、BA和NARA四种常见小品的实际使用情况。我们发现它们在使用方式上存在明显差异。特别地，不同的助词被用来表示主要分句的不同内容。我们认为这些差异是由说话者试图与听者分享的知识的差异造成的，并在试图分享的知识差异的基础上引入了语词的话语功能。

引用次数: 0

A progress report of the Taiwan Mandarin radio speech corpus project 台湾普通话广播语料库项目进展报告

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384450

Y. Liao, Y. Chang, Sing-Yue Wang, Jhih-wei Chen, Sheng-Ming Wang, Jenq-Haur Wang

The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.

台湾普通话广播语音语料库包含300小时(并且还在不断增长)的高质量录音，这些录音是从台湾国家教育广播电台(NER)的档案中挑选出来的。该语料库包含语音(各种说话风格，由数百名演讲者产生)及其相应的转录(自动转录和手动校正)和注释，适合语音和语言研究。在本文中，我们报告了语料库开发的进展，特别是展示了在该语料库上音频事件检测/分割和半监督声学模型训练的实验结果。

引用次数: 6

Speech emotion recognition from Indonesian spoken language using acoustic and lexical features 基于声学和词汇特征的印尼语语音情感识别

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384467

Pipin Kurniawati, D. Lestari, M. L. Khodra

This paper describes our works to extend the previous work on emotion recognition for Indonesian spoken language. In this research, we construct an Indonesian emotional corpus (IDEC). In constructing the corpus, we aim at natural emotional occurrences from television talk shows. IDEC is utilized to construct the emotion recognizer using two main features, acoustic and lexical features. The Support Vector Machine (SVM), Random Forest (RF), and Multinomial Naive Bayes (MNB) algorithms are employed to model the emotions. Experiment result shows that SVM outperforms the RF and MNB algorithms. It achieves an average F- measure of 0.713 for 6 emotion classes by combining both acoustic and lexical features.

本文描述了我们在印尼语口语情感识别方面的工作。在本研究中，我们建构了一个印尼语情感语料库(IDEC)。在构建语料库时，我们以电视谈话节目中的自然情感事件为目标。利用IDEC技术，利用声学和词汇两个主要特征来构建情感识别器。采用支持向量机(SVM)、随机森林(RF)和多项朴素贝叶斯(MNB)算法对情绪进行建模。实验结果表明，SVM算法优于RF算法和MNB算法。结合声学特征和词汇特征，对6个情感类别的平均F-测量值为0.713。

引用次数: 4

Diphthongized vowels in the Xiuning Hui Chinese Dialect 休宁回族方言的双元音化

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384458

Minghui Zhang, Fang Hu

This paper gives an acoustic phonetic description of the diphthongized vowels in the Xiuning Hui Chinese dialect in terms of temporal structure, spectral property and dynamic aspect. The results suggest that diphthongized vowels in Xiuning function as an intermediate vowel category between monophthongs and diphthongs. And comparisons between the Xiuning case, Yi county Hui, and Qimen Hui reveal that the process of diphthongization is gradient in Hui dialects.

本文从时间结构、谱性和动态方面对休宁回族方言的双元音进行了语音描述。结果表明，休宁语的双元音是介于单元音和双元音之间的中间元音类别。通过对休宁回语、彝县回语和祁门回语的比较，可以发现回语方言双元音化的过程是渐变的。

引用次数: 1

Chinese TIMIT: A TIMIT-like corpus of standard Chinese 汉语TIMIT:一个类似于TIMIT的标准汉语语料库

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

Pub Date : 2017-11-01 DOI: 10.1109/ICSDA.2017.8384463

Jiahong Yuan, Hongwei Ding, Sishi Liao, Yuqing Zhan, M. Liberman

This paper describes an effort to build a TIMIT-like corpus in Standard Chinese, which is part of our "Global TIMIT" project. Three steps are involved and detailed in the paper: selection of sentences; speaker recruitment and recording; and phonetic segmentation. The corpus consists of 6000 sentences read by 50 speakers (25 females and 25 males). Phonetic segmentation obtained from forced alignment is provided, which has 93.2% agreement (of phone boundaries) within 20 ms compared to manual segmentation on 50 randomly selected sentences. Statistics on the number of tokens and mean duration of phones and tones in the corpus are also reported. Males have shorter phones/tones but more and longer utterance internal silences than females, demonstrating that males in this dataset speak faster but pause more frequently and longer.

本文描述了在标准汉语中建立一个类似TIMIT的语料库的努力，这是我们的“全球TIMIT”项目的一部分。本文主要涉及三个步骤:选句;演讲者招募及录音;还有语音分割。该语料库由50位演讲者(25位女性和25位男性)朗读的6000个句子组成。提供了通过强制对齐获得的语音切分，与随机选择50个句子的人工切分相比，在20 ms内(电话边界)的一致性为93.2%。对语料库中电话和音调的令牌数量和平均持续时间的统计也进行了报道。与女性相比，男性的电话/音调更短，但话语内部沉默的时间更长，这表明该数据集中的男性说话更快，但停顿的频率更高，时间更长。

引用次数: 8

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀