Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423479
Yong Xu, Wu Guo, Lirong Dai
Spoken term detection (STD) is a task for open vocabulary search in large recordings of speech. Although the term detection performance for in-vocabulary (INV) terms has achieved a great improvement, the detection performance for out of vocabulary (OOV) terms is still disappointing. In this paper, we propose to combine fragment-based with syllable-based search into a hybrid STD system for OOV terms. Syllable is a kind of knowledge-based subword while fragment is data-driven. We initially compare their different modeling ability for OOVs. Considering the potential complementarities between them, we explore two methods of fusion: index fusion (combining the triphone indexes of a fragment-based and a syllable-based system) and result fusion (merging search results of the two systems). After the result fusion, we achieve a 9.4% relative improvement on NIST STD06 English conversational telephone speech (CTS) EvalSet in actual term weighted value (ATWV).
{"title":"A hybrid fragment / syllable-based system for improved OOV term detection","authors":"Yong Xu, Wu Guo, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423479","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423479","url":null,"abstract":"Spoken term detection (STD) is a task for open vocabulary search in large recordings of speech. Although the term detection performance for in-vocabulary (INV) terms has achieved a great improvement, the detection performance for out of vocabulary (OOV) terms is still disappointing. In this paper, we propose to combine fragment-based with syllable-based search into a hybrid STD system for OOV terms. Syllable is a kind of knowledge-based subword while fragment is data-driven. We initially compare their different modeling ability for OOVs. Considering the potential complementarities between them, we explore two methods of fusion: index fusion (combining the triphone indexes of a fragment-based and a syllable-based system) and result fusion (merging search results of the two systems). After the result fusion, we achieve a 9.4% relative improvement on NIST STD06 English conversational telephone speech (CTS) EvalSet in actual term weighted value (ATWV).","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127917560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423499
M. Unoki, Xugang Lu
Recent methods of speech enhancement have been proposed to suppress the effects of background noise and reverberation. The effect of background noise in these methods is regarded as additive and that of reverberation is convolutive. Therefore, methods of reducing noise and dereverberation have been applied separately in tandem. We previously unified the effects of noise and reverberation in the modulation transfer function (MTF) concept and we proposed an approach to simultaneously removing the effects of noise and reverberation. This paper further verifies our proposed approach, mathematically and quantitatively. In addition, we generalized the model to four types of real application scenarios, which corresponded to far and near field conditions under noisy reverberant conditions, based on mathematical analysis. We carried out 12,000 simulations for each according to the four application scenarios for denoising and dereverberating noisy reverberant speech, and objectively evaluated the proposed approach. The experimental results revealed that the proposed approach worked well in simultaneously denoising and dereverberating noisy reverberant speech.
{"title":"Unified denoising and dereverberation method used in restoration of MTF-based power envelope","authors":"M. Unoki, Xugang Lu","doi":"10.1109/ISCSLP.2012.6423499","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423499","url":null,"abstract":"Recent methods of speech enhancement have been proposed to suppress the effects of background noise and reverberation. The effect of background noise in these methods is regarded as additive and that of reverberation is convolutive. Therefore, methods of reducing noise and dereverberation have been applied separately in tandem. We previously unified the effects of noise and reverberation in the modulation transfer function (MTF) concept and we proposed an approach to simultaneously removing the effects of noise and reverberation. This paper further verifies our proposed approach, mathematically and quantitatively. In addition, we generalized the model to four types of real application scenarios, which corresponded to far and near field conditions under noisy reverberant conditions, based on mathematical analysis. We carried out 12,000 simulations for each according to the four application scenarios for denoising and dereverberating noisy reverberant speech, and objectively evaluated the proposed approach. The experimental results revealed that the proposed approach worked well in simultaneously denoising and dereverberating noisy reverberant speech.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129285831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, trans-consonantal vowel-to-vowel coarticulation in Chinese is analyzed. The target words are in the form of `bV1.ba', which are designed to occur on the sentence initial, medial and final positions, and the subjects are eight native speakers of standard Chinese. Vowel formants are examined at the onset, middle and offset points of the target vowel. Results show that trans-segmental coarticulation exists in Chinese, especially at the onset point of the target vowel. Coarticulation is more likely to occur on F2, and its effect is comparatively great on the sentence initial position.
{"title":"A study on the coarticulation of bi-syllabic words in Chinese","authors":"Maolin Wang, Shengnan Xiong, Jiayun Li, Ziyu Xiong","doi":"10.1109/ISCSLP.2012.6423453","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423453","url":null,"abstract":"In this study, trans-consonantal vowel-to-vowel coarticulation in Chinese is analyzed. The target words are in the form of `bV1.ba', which are designed to occur on the sentence initial, medial and final positions, and the subjects are eight native speakers of standard Chinese. Vowel formants are examined at the onset, middle and offset points of the target vowel. Results show that trans-segmental coarticulation exists in Chinese, especially at the onset point of the target vowel. Coarticulation is more likely to occur on F2, and its effect is comparatively great on the sentence initial position.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115137628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423497
Xiaoyin Fu, Wei Wei, Lichun Fan, Shixiang Lu, Bo Xu
Hierarchical phrase-based (HPB) translation has been introduced to speech-to-speech (S2S) translation system on mobile terminals, such as smartphones. However, it suffers from the explosive growth in the number of rules along with the increment in decoding time for S2S translation system when the memory and decoding speed is restricted. In this paper, we propose a nesting HPB model to capture the topological structure of hierarchical rules on the source language side, which will not only filter out the redundant rules in HPB model but also speed up the decoder. Experiments on the HPB translation system show that our approach can greatly reduce the rule table size by 75% with a faster decoder, and yield the same translation quality (measured by using BLEU) as the state-of-art HPB model.
{"title":"Nesting hierarchical phrase-based model for speech-to-speech translation","authors":"Xiaoyin Fu, Wei Wei, Lichun Fan, Shixiang Lu, Bo Xu","doi":"10.1109/ISCSLP.2012.6423497","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423497","url":null,"abstract":"Hierarchical phrase-based (HPB) translation has been introduced to speech-to-speech (S2S) translation system on mobile terminals, such as smartphones. However, it suffers from the explosive growth in the number of rules along with the increment in decoding time for S2S translation system when the memory and decoding speed is restricted. In this paper, we propose a nesting HPB model to capture the topological structure of hierarchical rules on the source language side, which will not only filter out the redundant rules in HPB model but also speed up the decoder. Experiments on the HPB translation system show that our approach can greatly reduce the rule table size by 75% with a faster decoder, and yield the same translation quality (measured by using BLEU) as the state-of-art HPB model.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129876747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}