首页 > 最新文献

2012 8th International Symposium on Chinese Spoken Language Processing最新文献

英文 中文
Bayesian nonparametric language models 贝叶斯非参数语言模型
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423460
Ying-Lang Chang, Jen-Tzung Chien
Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.
后退平滑和主题建模是n-gram语言模型中的关键问题。本文提出了一种贝叶斯非参数学习方法来解决这两个问题。我们开发了一个基于主题的语言模型,其中主题的数量和n-gram是自动从数据中确定的。为了解决这一模型选择问题,我们引入了主题和回退n-图的非参数先验。通过分层狄利克雷过程复合Pitman-Yor (PY)过程构造无限语言模型。我们开发了具有幂律行为的基于主题的层次语言模型(THPY-LM)。该模型可以通过忽略主题信息而简化为层次模型(HPY),也可以通过进一步忽略贝叶斯处理而简化为改进的Kneser-Ney模型(MKN)。在实验中,所提出的THPY-LM优于目前使用MKN-LM和HPY-LM的方法。
{"title":"Bayesian nonparametric language models","authors":"Ying-Lang Chang, Jen-Tzung Chien","doi":"10.1109/ISCSLP.2012.6423460","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423460","url":null,"abstract":"Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121702641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A real-time tone enhancement method for continuous Mandarin speeches 普通话连续演讲的实时语气增强方法
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423534
Ye Tian, Jia Jia, Yongxin Wang, Lianhong Cai
Chinese Mandarin is a tonal language. Tone perception ability of people with sensorineural hearing loss (SNHL) is often weaker than normal people. To help the SNHL people better perceive and distinguish tone information in Chinese speech, we focus on real-time tone enhancement method for mandarin continuous speeches. In this paper, based on the experimental investigation on the acoustic features most related to tone perception, we propose a practical tone enhancing model which employs the unified features independent of Chinese tonal patterns. Using this model, we further implement a real-time tone enhancement method which can avoid syllable segmentation and tonal pattern recognition. By the tone identification test for the normal and SNHL people under both quiet and noisy backgrounds, it is found that the enhanced speeches with the proposed method gains an average 5% higher correct rate compared to original speeches. And the time delay of the enhancement method can be controlled within 800ms, which can be further used in hearing aids to benefit the SNHL people in their daily life.
汉语普通话是一种声调语言。感音神经性听力损失(SNHL)患者的音调感知能力往往弱于正常人。为了帮助SNHL人群更好地感知和区分汉语语音中的语调信息,我们重点研究了汉语连续语音的实时语调增强方法。本文在对与声调感知最相关的声学特征进行实验研究的基础上,提出了一种独立于汉语声调模式的统一特征的实用的声调增强模型。利用该模型,我们进一步实现了一种实时的音调增强方法,该方法可以避免音节分割和音调模式识别。通过对安静和嘈杂背景下正常人和SNHL人群的语音识别测试,发现采用该方法增强的语音比原始语音的正确率平均提高了5%。增强方法的延时可控制在800ms以内,可进一步应用于助听器,使SNHL患者在日常生活中受益。
{"title":"A real-time tone enhancement method for continuous Mandarin speeches","authors":"Ye Tian, Jia Jia, Yongxin Wang, Lianhong Cai","doi":"10.1109/ISCSLP.2012.6423534","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423534","url":null,"abstract":"Chinese Mandarin is a tonal language. Tone perception ability of people with sensorineural hearing loss (SNHL) is often weaker than normal people. To help the SNHL people better perceive and distinguish tone information in Chinese speech, we focus on real-time tone enhancement method for mandarin continuous speeches. In this paper, based on the experimental investigation on the acoustic features most related to tone perception, we propose a practical tone enhancing model which employs the unified features independent of Chinese tonal patterns. Using this model, we further implement a real-time tone enhancement method which can avoid syllable segmentation and tonal pattern recognition. By the tone identification test for the normal and SNHL people under both quiet and noisy backgrounds, it is found that the enhanced speeches with the proposed method gains an average 5% higher correct rate compared to original speeches. And the time delay of the enhancement method can be controlled within 800ms, which can be further used in hearing aids to benefit the SNHL people in their daily life.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122422139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information allocation and prosodic expressiveness in continuous speech: A Mandarin cross-genre analysis 连续语中的信息分配与韵律表达:普通话跨体裁分析
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423535
Chiu-yu Tseng, Chao-yu Su
In addition to discourse association and assuming that allocation of key information is an important feature of prosodic expressiveness of continuous speech, the common accentuation patterns across 3 Mandarin speech genres through 4 degrees of perceived emphases are derived. Using frequency count as another control, it is found that only 6 types of emphasis patterns are needed account for 70% of the speech data regardless of genre. The 6 emphasis types are further compared for the distribution of (1) discourse units and emphasis tokens by speech genre, (2) emphasis pattern by phrase and (3) with respect to discourse positions to see if genre-specific features could be found. Results reveal that genre-dependent features can also be accounted for. In addition, individual genre properties are found to also be correlated with phrase length and specific emphasis patterns.
除了话语关联和假设关键信息的分配是连续语音韵律表达的重要特征外,本文还通过4个感知重点度推导出了3种普通话语音类型的共同重音模式。使用频率计数作为另一种控制,发现只需要6种强调模式,就占了70%的语音数据,而不考虑类型。进一步比较6种强调类型(1)按话语类型的话语单位和强调符号的分布,(2)按短语的强调模式,(3)按话语位置的强调模式的分布,看看是否可以找到特定于体裁的特征。结果表明,类型依赖特征也可以被解释。此外,个体体裁特性也与短语长度和特定的重音模式相关。
{"title":"Information allocation and prosodic expressiveness in continuous speech: A Mandarin cross-genre analysis","authors":"Chiu-yu Tseng, Chao-yu Su","doi":"10.1109/ISCSLP.2012.6423535","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423535","url":null,"abstract":"In addition to discourse association and assuming that allocation of key information is an important feature of prosodic expressiveness of continuous speech, the common accentuation patterns across 3 Mandarin speech genres through 4 degrees of perceived emphases are derived. Using frequency count as another control, it is found that only 6 types of emphasis patterns are needed account for 70% of the speech data regardless of genre. The 6 emphasis types are further compared for the distribution of (1) discourse units and emphasis tokens by speech genre, (2) emphasis pattern by phrase and (3) with respect to discourse positions to see if genre-specific features could be found. Results reveal that genre-dependent features can also be accounted for. In addition, individual genre properties are found to also be correlated with phrase length and specific emphasis patterns.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124163753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spoken term detection for OOV terms based on triphone confusion matrix 基于三音混淆矩阵的OOV口语词检测
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423480
Yong Xu, Wu Guo, Shan Su, Lirong Dai
The search for out of vocabulary (OOV) query terms in spoken term detection (STD) task is addressed in this paper. The phone level fragment with word-position marker is naturally adopted as the speech recognition decoding unit. Then the triphone confusion matrix (TriCM) is used to expand the query space to compensate for speech recognition errors. And we also propose a new approach to construct triphone confusion matrix using a smoothing method similar with the Katz method to solve the data sparseness problem. Experimental result on the NIST STD06 eval-set conversational telephone speech (CTS) corpus indicates that triphone confusion matrix can provide a relative improvement of 12% in actual term weighted value (ATWV).
研究了口语词检测(STD)任务中词汇外查询词的搜索问题。自然采用带字位标记的电话级片段作为语音识别解码单元。然后利用三音混淆矩阵(TriCM)扩展查询空间来补偿语音识别误差。我们还提出了一种新的方法,利用类似于Katz方法的平滑方法构造三音混淆矩阵来解决数据稀疏性问题。在NIST STD06评估集会话电话语音(CTS)语料库上的实验结果表明,三音混淆矩阵在实际词加权值(ATWV)上可以提供12%的相对改进。
{"title":"Spoken term detection for OOV terms based on triphone confusion matrix","authors":"Yong Xu, Wu Guo, Shan Su, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423480","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423480","url":null,"abstract":"The search for out of vocabulary (OOV) query terms in spoken term detection (STD) task is addressed in this paper. The phone level fragment with word-position marker is naturally adopted as the speech recognition decoding unit. Then the triphone confusion matrix (TriCM) is used to expand the query space to compensate for speech recognition errors. And we also propose a new approach to construct triphone confusion matrix using a smoothing method similar with the Katz method to solve the data sparseness problem. Experimental result on the NIST STD06 eval-set conversational telephone speech (CTS) corpus indicates that triphone confusion matrix can provide a relative improvement of 12% in actual term weighted value (ATWV).","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127393849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self documentation of endangered languages 濒危语言的自我记录
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423541
S. Dhakhwa, J. Allwood
Several minority languages are on the verge of extinction in Nepal, especially when they don't have a generally accepted writing system and occur in an area where Nepali (the official language) is predominantly used. Lohorung is an example, which is spoken among the Lohroung Rai communities of Sankhuwasabha, a hilly district of eastern Nepal. Older generations of Lohorung are experts in Lohorung but they have limitations in reading and writing English or Nepali. The documentation of Lohorung and other similar endangered languages is important. If the right tools and techniques are used, we believe that self documentation is one of the best ways, to document a language. We have developed an online platform which community members can use to collaboratively self document their language. The platform is a multimodal dictionary authoring and browsing tool and it has been developed with a focus on usability, ease of use and productivity.
在尼泊尔,一些少数民族语言正处于灭绝的边缘,特别是当这些语言没有被普遍接受的书写系统,并且出现在主要使用尼泊尔语(官方语言)的地区时。Lohorung就是一个例子,它是尼泊尔东部山区Sankhuwasabha的lohrong Rai社区使用的语言。老一辈的洛洪人精通洛洪语,但他们在英语或尼泊尔语的读写方面有局限性。洛洪语和其他类似的濒危语言的文献是很重要的。如果使用了正确的工具和技术,我们相信自我文档是记录语言的最佳方式之一。我们已经开发了一个在线平台,社区成员可以使用它来协作地自我记录他们的语言。该平台是一个多模式的词典创作和浏览工具,它的开发重点是可用性,易用性和生产力。
{"title":"Self documentation of endangered languages","authors":"S. Dhakhwa, J. Allwood","doi":"10.1109/ISCSLP.2012.6423541","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423541","url":null,"abstract":"Several minority languages are on the verge of extinction in Nepal, especially when they don't have a generally accepted writing system and occur in an area where Nepali (the official language) is predominantly used. Lohorung is an example, which is spoken among the Lohroung Rai communities of Sankhuwasabha, a hilly district of eastern Nepal. Older generations of Lohorung are experts in Lohorung but they have limitations in reading and writing English or Nepali. The documentation of Lohorung and other similar endangered languages is important. If the right tools and techniques are used, we believe that self documentation is one of the best ways, to document a language. We have developed an online platform which community members can use to collaboratively self document their language. The platform is a multimodal dictionary authoring and browsing tool and it has been developed with a focus on usability, ease of use and productivity.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130474597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
More targets? Simulating emotional intonation of mandarin with PENTA 更多的目标?用PENTA模拟普通话的情绪语调
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423546
Ai-jun Li, Qiang Fang, Yuan Jia, J. Dang
This study attempts to evaluate the validity of PENTA model in simulating the emotional intonation of Mandarin Chinese. Based on the previous analysis on the Mandarin emotional intonation, it suggested that two target tones are needed in some conditions, such as the condition for the successive addition boundary tone (SUABT). It is a new encoding scheme in Chinese PENTA model, which means one syllable should have two tonal targets, one for lexical tone and the other is for the expressive tone. The numeric and perceptual assessments on the performances of simulating Mandarin Chinese emotional intonation with the new encoding schemes are evaluated. The results indicated that two targets are necessary and efficient to simulate boundary tones, in other words, the new encoding scheme is required to realize some kinds of emotional boundary tones by setting two target tones to convey expressive emotions.
本研究旨在评估PENTA模型在模拟汉语情绪语调方面的有效性。在前人对汉语情绪语调分析的基础上,提出在某些情况下,如连续附加边界音(SUABT)的情况下,需要两个目标声调。这是汉语五联模式中的一种新的编码方案,即一个音节应该有两个声调目标,一个是词性声调,另一个是表达性声调。对新编码方案模拟汉语情绪语调的性能进行了数值评价和感知评价。结果表明,两个目标对边界音的模拟是必要且有效的,也就是说,新的编码方案需要通过设置两个目标来实现某些类型的情感边界音来表达情感。
{"title":"More targets? Simulating emotional intonation of mandarin with PENTA","authors":"Ai-jun Li, Qiang Fang, Yuan Jia, J. Dang","doi":"10.1109/ISCSLP.2012.6423546","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423546","url":null,"abstract":"This study attempts to evaluate the validity of PENTA model in simulating the emotional intonation of Mandarin Chinese. Based on the previous analysis on the Mandarin emotional intonation, it suggested that two target tones are needed in some conditions, such as the condition for the successive addition boundary tone (SUABT). It is a new encoding scheme in Chinese PENTA model, which means one syllable should have two tonal targets, one for lexical tone and the other is for the expressive tone. The numeric and perceptual assessments on the performances of simulating Mandarin Chinese emotional intonation with the new encoding schemes are evaluated. The results indicated that two targets are necessary and efficient to simulate boundary tones, in other words, the new encoding scheme is required to realize some kinds of emotional boundary tones by setting two target tones to convey expressive emotions.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125135184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A study on cross-language knowledge integration in Mandarin LVCSR 普通话LVCSR跨语言知识整合研究
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423528
Chen-Yu Chiang, S. Siniscalchi, Yih-Ru Wang, Sin-Horng Chen, Chin-Hui Lee
We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.
为了提高大词汇量连续语音识别的性能,提出了一种跨语言知识集成框架。其中包括两种类型的知识来源:方式属性和韵律结构。在发音方式方面,使用美国英语语料库(WSJ0)训练的跨语言属性检测器来验证和重分由最先进的系统获得的词格中的假设普通话音节。对于韵律结构,使用无监督联合韵律标记和建模技术训练的模型使用普通话语料库(TCC300)进行格点评分。基于TCC300语料库的普通话音节、字符和单词识别实验结果表明,该方法显著优于不使用发音和韵律信息的基线系统。它还展示了利用跨语言属性检测器的结果作为自动语音识别的语言通用前端的潜力。
{"title":"A study on cross-language knowledge integration in Mandarin LVCSR","authors":"Chen-Yu Chiang, S. Siniscalchi, Yih-Ru Wang, Sin-Horng Chen, Chin-Hui Lee","doi":"10.1109/ISCSLP.2012.6423528","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423528","url":null,"abstract":"We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133755937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robust voice activity detection using empirical mode decomposition and modulation spectrum analysis 鲁棒语音活动检测使用经验模式分解和调制频谱分析
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423519
Y. Kanai, M. Unoki
Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.
语音活动检测(VAD)用于检测观察信号中的语音/非语音周期。然而,当前的VAD技术存在一个严重的问题,即如果它用于嘈杂的语音和/或语音/非语音的混合,例如音乐和环境声音,则语音周期检测的准确性会大大降低。因此,VAD需要具有鲁棒性,以便在这些情况下准确检测语音周期。本文提出了一种基于经验模态分解(EMD)和调制频谱分析(MSA)的鲁棒VAD方法来解决这些问题。这建议通过使用EMD而不估计信噪比(噪声条件)来降低背景噪声,然后使用MSA来确定语音/非语音周期。通过与典型方法(Otsu’s和G.729B)的比较,在实际环境中进行了三个VAD实验,对所提出的方法进行了评估。实验结果表明,该方法能较传统的语音周期检测方法更准确地检测语音周期。
{"title":"Robust voice activity detection using empirical mode decomposition and modulation spectrum analysis","authors":"Y. Kanai, M. Unoki","doi":"10.1109/ISCSLP.2012.6423519","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423519","url":null,"abstract":"Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"64 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132836777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Two objective measures for speech distortion and noise reduction evaluation of enhanced speech signals 语音失真和增强语音信号降噪评价的两个客观指标
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423488
H. Ding, Tan Lee, I. Soon
Among all the existing objective measures, few are able to give a clearly specific indication on speech distortion or noise reduction although speech distortion and noise reduction are two key metrics to evaluate the enhanced speech quality. In this paper, two objective measurement tools are proposed to separately evaluate the capability of a speech enhancement filter in terms of recovering the clean speech and reducing the noise. Several common speech enhancement algorithms are evaluated by these objective measures as well as subjective listening test. Correlations between the results of objective measure and subjective measure clearly show the effectiveness of the proposed objective measures in evaluating the quality of enhanced speech signals.
虽然语音失真和降噪是评价语音质量增强的两个关键指标,但在现有的客观度量中,很少能明确具体地表示语音失真或降噪。本文提出了两种客观的测量工具来分别评价语音增强滤波器在恢复干净语音和降低噪声方面的能力。通过客观测量和主观听力测试对几种常用的语音增强算法进行了评价。客观测量结果与主观测量结果之间的相关性清楚地表明了所提出的客观测量方法在评价增强语音信号质量方面的有效性。
{"title":"Two objective measures for speech distortion and noise reduction evaluation of enhanced speech signals","authors":"H. Ding, Tan Lee, I. Soon","doi":"10.1109/ISCSLP.2012.6423488","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423488","url":null,"abstract":"Among all the existing objective measures, few are able to give a clearly specific indication on speech distortion or noise reduction although speech distortion and noise reduction are two key metrics to evaluate the enhanced speech quality. In this paper, two objective measurement tools are proposed to separately evaluate the capability of a speech enhancement filter in terms of recovering the clean speech and reducing the noise. Several common speech enhancement algorithms are evaluated by these objective measures as well as subjective listening test. Correlations between the results of objective measure and subjective measure clearly show the effectiveness of the proposed objective measures in evaluating the quality of enhanced speech signals.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124357906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers 语音语音识别:在平行电话识别器中使用不同的声学模型
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423509
C. Leung, B. Ma, Haizhou Li
In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.
在语音语音识别系统中,为了解决训练条件与测试条件不匹配的问题,声学模型自适应优先于手机格解码。此外,常用的是结合多种音致化特征。这促使我们深入研究如何从不同的声学模型中结合不同的音致性特征。我们的实验表明,在2007年NIST语言识别评估(LRE)的30秒封闭集试验中,我们的方法实现了1.94%的相等错误率(EER)。与使用并行电话识别器、声学模型中的扬声器自适应训练(SAT)和cmlr自适应的复杂系统相比,该系统的EER相对提高了14.9%。此外,研究表明,我们的方法在三种不同的语音识别系统中提供了一致和实质性的改进,其中每种系统使用单个电话识别器。
{"title":"Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers","authors":"C. Leung, B. Ma, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423509","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423509","url":null,"abstract":"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 8th International Symposium on Chinese Spoken Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1