2009 Oriental COCOSDA International Conference on Speech Database and Assessments最新文献

英文中文

Message from the Oriental-COCOSDA convener 东方cocosda召集人寄语

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2011-11-28 DOI: 10.1109/ICSDA.2011.6085965

Chiu-yu Tseng

Welcome to Oriental-COCOSDA 2011 at Hsinchu, Taiwan. This is the 14th annual conference of Oriental-COCOSDA, the Oriental chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. From 1998, annual meetings have been held at Tsukuba, Taipei, Beijing, Jeju, Huahin, Singapore, Delhi, Jakarta, Penang, Hanoi, Beijing, Kyoto, Katmandu and this year Hsinchu. I would like to thank the colleagues from Taiwan and headed by Conference Chair Professor Hsiao-Chuan Wang for making the event possible this time in Taiwan.

欢迎光临2011台湾新竹东方cocosda。这是第14届语音数据库和语音输入/输出系统与评估国际协调委员会东方分会Oriental- cocosda年会。从1998年起，每年在筑波、台北、北京、济州、华新、新加坡、德里、雅加达、槟城、河内、北京、京都、加德满都和今年的新竹举行会议。我要感谢以会议主席王小川教授为首的来自台湾的同事们，感谢他们使这次会议能够在台湾举行。

引用次数: 0

An HMM-based Vietnamese speech synthesis system 基于hmm的越南语语音合成系统

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278366

T. Vu, Mai Chi Luong, Satoshi Nakamura

This paper describes an approach to the realization of a Vietnamese speech synthesis system applying a technique whereby speech is directly synthesized from Hidden Markov models (HMMs). Spectrum, pitch, and phone duration are simultaneously modeled in HMMs and their parameter distributions are clustered independently by using decision tree-based context clustering algorithms. Several contextual factors such as tone types, syllables, words, phrases, and utterances were determined and are taken into account to generate the spectrum, pitch, and state duration. The resulting system yields significant correctness for a tonal language, and a fair reproduction of the prosody.

本文描述了一种越南语语音合成系统的实现方法，该系统采用了一种直接从隐马尔可夫模型(hmm)合成语音的技术。在hmm中同时建模频谱、音高和通话时长，并使用基于决策树的上下文聚类算法对其参数分布进行独立聚类。几个上下文因素，如音调类型、音节、单词、短语和话语被确定，并被考虑到产生频谱、音调和状态持续时间。由此产生的系统为声调语言提供了显著的正确性，并公平地再现了韵律。

引用次数: 36

A chain of Gaussian Mixture Model for text-independent speaker recognition 基于链高斯混合模型的文本无关说话人识别

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278367

Yanxiang Chen, Ming Liu

Text-independent speaker recognition has better flexibility than text-dependent method. However, due to the phonetic content difference, the text-independent methods usually achieve lower performance than text-dependent method. In order to combining the flexibility of text-independent method and the high performance of text-dependent method, we propose a new modeling technique named a chain of Gaussian Mixture Model which encoding the temporal correlation of the training utterance in the chain structure. A special decoding network is then used to evaluate the test utterance to find the best possible phonetic matched segments between test utterance and training utterance. The experimental results indicate that the proposed method significantly improve the system performance, especially for the short test utterance.

与依赖文本的方法相比，不依赖文本的说话人识别具有更好的灵活性。然而，由于语音内容的差异，不依赖文本的方法通常比依赖文本的方法性能要低。为了结合文本独立方法的灵活性和文本依赖方法的高性能，我们提出了一种新的建模技术——高斯混合链模型，该模型将训练话语的时间相关性编码在链结构中。然后使用一个特殊的解码网络对测试话语进行评估，以找到测试话语和训练话语之间可能的最佳语音匹配片段。实验结果表明，该方法显著提高了系统的性能，特别是对于较短的测试话语。

引用次数: 0

An undergraduate Mandarin speech database for speaker recognition research 面向说话人识别研究的大学生普通话语音数据库

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278370

Hong Wang, Jingui Pan

This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.

本文介绍了一个用于说话人识别研究的新型语音数据库UMSD(本科生普通话语音数据库)的开发。在UMSD中，选出的24名本科生每人总共有12个会话，而所有的录音都是在不同的会话间隔进行的。语音均衡的语料库内容包括孤立的数字(0 ~ 9)、数字串(5个电话号码和2个邮政编码)、1 ~ 10个字符(每个给定长度10个字符)的不同长度的单词和短语、汉语音标表(21个声母和35个韵母)、2首古诗和一篇200字的知名文章段落。此外，为了有效地提取和处理UMSD中感兴趣的语音片段，提出了一种基于MATLAB和MS-ACCESS的语音数据库管理系统。初步评价结果表明，该方法取得了良好的性能，不仅满足了我们目前在依赖文本和不依赖文本的说话人识别方面的需要，而且由于它具有不同会话间隔的多会话记录，可以进一步研究长期的说话人内部变异性。

{"title":"An undergraduate Mandarin speech database for speaker recognition research","authors":"Hong Wang, Jingui Pan","doi":"10.1109/ICSDA.2009.5278370","DOIUrl":"https://doi.org/10.1109/ICSDA.2009.5278370","url":null,"abstract":"This paper describes the development of a new speech database for speaker recognition research, UMSD (undergraduate Mandarin speech database). In UMSD, there are total 12 sessions of utterances for each of the selected 24 undergraduate students, while all recordings are conducted in different session intervals. The phonetically balanced corpus content include isolated digits (0∼9), digit strings (5 phone numbers and 2 postal codes), words and phrases with different length from 1 to 10 characters (10 for each given length), the Chinese Phonetic Alphabet Table (21 Initials and 35 Finals), 2 ancient poems and a 200 words paragraph extracted from a well-known essay. Additionally, in order to effectively extract and process the interesting speech segments from UMSD, a speech database management system has been proposed on the base of MATLAB and MS-ACCESS. Results of preliminary evaluation show that the performance attained with UMSD is good, it not only meets the needs of our own recent effort in text-dependent and text-independent speaker recognition, but also allows the further research of the long term intra-speaker variability thanks to its multi-session records with different session intervals.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"8 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133042016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Research on Uyghur framenet description system 维吾尔语框架描述系统研究

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278358

Alifu Kuerban, Wumaierjiang Kuerban, Nijat Abdurusul

This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.

本文对维吾尔语源语的框架语义描述体系及其内容进行了初步的探讨和尝试，根据描述内容叙述了框架网的构成，对现代维吾尔语框架网的框架元素的语义角色进行了描述和分类，确定了语义角色标注体系，为维吾尔语框架句法语义的识别和分析奠定了良好的基础。在认知的基础上，探索构建维吾尔语框架的可行方法和思路。

引用次数: 2

An investigation on the Mandarin prosody of a parallel multi-speaking rate speech corpus 平行多语速语料库的汉语韵律研究

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278360

Chen-Yu Chiang, C. Tang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen

In this paper, the prosody of a parallel multi-speaking rate Mandarin read speech corpus is investigated. The corpus contains four parallel speech datasets uttered by a female professional announcer with various speech rates (SRs) of 4.40 (fast), 3.82 (normal), 2.97 (median) and 2.45 (slow) syllables/second. By using the unsupervised joint prosody labeling and modeling (PLM) method proposed previously, the relationship between SR and various prosodic features, including pause duration, patterns of three high-level prosodic constituents, and the break labels, are investigated. The analyses reported in this study could be very informative in developing prosody generation mechanism for text-to-speech and prosody modeling for automatic speech recognition in various SRs.

本文对平行多语速汉语阅读语料库的韵律进行了研究。语料库包含4个由女性专业播音员发出的平行语音数据集，其语音速率(SRs)分别为4.40(快速)、3.82(正常)、2.97(中位数)和2.45(慢速)音节/秒。采用无监督联合韵律标注和建模(PLM)方法，研究了SR与各种韵律特征之间的关系，包括暂停时间、三个高级韵律成分的模式和中断标签。本研究的分析结果对于开发文本到语音的韵律生成机制和各种语音自动识别的韵律建模具有重要的参考价值。

引用次数: 7

Modeling characteristics of agglutinative languages with Multi-class language model for ASR system 基于多类语言模型的ASR系统黏着语言建模特征

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278368

I. Dawa, Y. Sagisaka, S. Nakamura

In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.

在本文中，我们讨论了一种新的语言模型，该模型考虑了黏着语言的特点。我们以蒙古语(蒙古国使用的一种西里尔语言系统)为例来构建语言模型。本文基于相似词聚类，建立了一个多类N-gram语言模型，重点研究蒙古语词缀的变化。通过应用我们提出的语言模型，与使用ATRASR引擎的传统词n图识别系统相比，识别系统的性能提高了6.85%。我们还证实了我们的新模型将有利于资源缺乏语言的ASR系统的快速开发，特别是对于像蒙古语这样的黏着语言。

引用次数: 1

Emphasized speech synthesis based on hidden Markov models 重点介绍了基于隐马尔可夫模型的语音合成

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278371

Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano

This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing an emphasized accentual phrase are additionally considered in model training. Moreover, to build HMMs for synthesizing both normal speech and emphasized speech, we investigate two training methods; one is training of individual models for normal and emphasized speech using each of these two types of speech data separately; and the other is training of a mixed model using both of them simultaneously. The experimental results demonstrate that 1) HMM-based speech synthesis is effective for synthesizing emphasized speech and 2) the mixed model allows a more compact HMM set generating more naturally sounding but slightly less emphasized speech compared with the individual models.

提出了一种基于隐马尔可夫模型(hmm)的强调语音合成的统计方法。上下文相关hmm是通过有意地强调句子中的任意重音短语而发出的强调语音数据来训练的。为了模拟强调语音的声学特征，在模型训练中还考虑了描述强调重音短语的新上下文因素。此外，为了构建能够综合正常语音和强调语音的hmm，我们研究了两种训练方法;一种是分别使用这两种类型的语音数据训练正常语音和强调语音的单独模型;另一种是同时使用这两种方法来训练混合模型。实验结果表明:1)基于HMM的语音合成对于合成重音语音是有效的;2)与单个模型相比，混合模型使HMM集更紧凑，生成的语音听起来更自然，但重音程度略低。

引用次数: 21

Acoustic manifestations of information categories in Standard Chinese 标准汉语信息类的声学表现

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278372

Yuan Jia, Ai-jun Li

The present study mainly investigates the acoustic manifestations of various information categories in Standard Chinese (hereinafter, SC). Results of experiments have demonstrated that rheme focus, theme focus, rheme background and theme background can be reflected by different acoustic realizations. Specifically, rheme focus and theme focus can induce F0 and duration prominences, and the former exerts more obvious variations. Although rheme background and theme background introduce no prominences, the former can be manifested by greater magnitude of acoustic performances than the latter.

本研究主要研究标准汉语中各种信息类别的声学表现。实验结果表明，不同的声学实现方式可以反映述位焦点、主位焦点、述位背景和主位背景。其中，述位焦点和主位焦点均能诱发F0和时长突出，且述位焦点变化更为明显。虽然述位背景和主题背景没有突出的作用，但前者可以表现为比后者更大的声学表现幅度。

引用次数: 0

Toward translating Indonesian spoken utterances to/from other languages 将印尼语翻译成其他语言

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Pub Date : 2009-10-02 DOI: 10.1109/ICSDA.2009.5278362

S. Sakti, Michael Paul, R. Maia, S. Sakai, Noriyuki Kimura, Yutaka Ashikari, E. Sumita, Satoshi Nakamura

This paper outlines the National Institute of Information and Communications Technology / Advanced Telecommunications Research Institute International (NICT/ATR) research activities in developing a spoken language translation system, specially for translating Indonesian spoken utterances into/from Japanese or English. Since the NICT/ATR Japanese-English speech translation system is an established one and has been widely known for many years, our focus here is only on the additional components that are related to the Indonesian spoken language technology. This includes the development of an Indonesian large vocabulary continuous speech recognizer, Indonesian-Japanese and Indonesian-English machine translators, and an Indonesian speech synthesizer. Each of these component technologies was developed by using corpus-based speech and language processing approaches. Currently, all these components have been successfully incorporated into the mobile terminal of the NICT/ATR multilingual speech translation system.

本文概述了国家信息通信技术研究所/国际先进电信研究所(NICT/ATR)在开发口语翻译系统方面的研究活动，特别是将印度尼西亚口语翻译成日语或英语。由于NICT/ATR日英语音翻译系统是一个成熟的系统，并且已经广为人知多年，因此我们在这里只关注与印度尼西亚口语技术相关的附加组件。这包括开发印尼语大词汇连续语音识别器，印尼语-日语和印尼语-英语机器翻译，以及印尼语语音合成器。这些组件技术都是通过使用基于语料库的语音和语言处理方法开发的。目前，这些组件已成功集成到NICT/ATR多语种语音翻译系统的移动终端中。

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀