Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423465
Kui Wu, Yan Song, Wu Guo, Lirong Dai
Recently, the speaker clustering approach exploiting the intra-conversation variability in the total variability space has shown promising performance. However, there exists the variability in different segments of the same speaker within a conversation, termed as intra-conversation intra-speaker variability, which may scatter the distribution of the corresponding i-vector based representation of short speech segment, and degrades the clustering performance. To address this issue, we propose a new speaker clustering approach based on an extended total variability factor analysis. In our proposed method, the intra-conversation total variability space is divided into the inter-speaker and intra-speaker variability space. And by explicitly compensating the intra-conversation intra-speaker variability, the short speech segments would be represented more accurately. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on NIST SRE 2008 summed channel telephone dataset. The experimental results show that the proposed method clearly outperforms the other state-of-the-art speaker clustering techniques in terms of clustering error rate.
{"title":"Intra-conversation intra-speaker variability compensation for speaker clustering","authors":"Kui Wu, Yan Song, Wu Guo, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423465","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423465","url":null,"abstract":"Recently, the speaker clustering approach exploiting the intra-conversation variability in the total variability space has shown promising performance. However, there exists the variability in different segments of the same speaker within a conversation, termed as intra-conversation intra-speaker variability, which may scatter the distribution of the corresponding i-vector based representation of short speech segment, and degrades the clustering performance. To address this issue, we propose a new speaker clustering approach based on an extended total variability factor analysis. In our proposed method, the intra-conversation total variability space is divided into the inter-speaker and intra-speaker variability space. And by explicitly compensating the intra-conversation intra-speaker variability, the short speech segments would be represented more accurately. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on NIST SRE 2008 summed channel telephone dataset. The experimental results show that the proposed method clearly outperforms the other state-of-the-art speaker clustering techniques in terms of clustering error rate.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121038374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423477
Hsin-Te Hwang, Yu Tsao, H. Wang, Yih-Ru Wang, Sin-Horng Chen
In this paper, we propose a maximum mutual information (MMI) training criterion to refine the parameters of the joint density GMM (JDGMM) set to tackle the over-smoothing issue in voice conversion (VC). Conventionally, the maximum likelihood (ML) criterion is used to train a JDGMM set, which characterizes the joint property of the source and target feature vectors. The MMI training criterion, on the other hand, updates the parameters of the JDGMM set to increase its capability on modeling the dependency between the source and target feature vectors, and thus to make the converted sounds closer to the natural ones. The subjective listening test demonstrates that the quality and individuality of the converted speech by the proposed ML followed by MMI (ML+MMI) training method is better that by the ML training method.
{"title":"Exploring mutual information for GMM-based spectral conversion","authors":"Hsin-Te Hwang, Yu Tsao, H. Wang, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ISCSLP.2012.6423477","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423477","url":null,"abstract":"In this paper, we propose a maximum mutual information (MMI) training criterion to refine the parameters of the joint density GMM (JDGMM) set to tackle the over-smoothing issue in voice conversion (VC). Conventionally, the maximum likelihood (ML) criterion is used to train a JDGMM set, which characterizes the joint property of the source and target feature vectors. The MMI training criterion, on the other hand, updates the parameters of the JDGMM set to increase its capability on modeling the dependency between the source and target feature vectors, and thus to make the converted sounds closer to the natural ones. The subjective listening test demonstrates that the quality and individuality of the converted speech by the proposed ML followed by MMI (ML+MMI) training method is better that by the ML training method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423513
Jian Xu, Zhijie Yan, Qiang Huo
This paper presents a feature-transform based approach to unsupervised task adaptation and personalization for speech recognition. Given task-specific speech data collected from a deployed service, an “acoustic sniffing” module is built first by using a so-called i-vector technique with a number of acoustic conditions identified via i-vector clustering. Unsupervised maximum likelihood training is then performed to estimate a task-dependent feature transform for each acoustic condition, while pre-trained HMM parameters of acoustic models are kept unchanged. Given an unknown utterance, an appropriate feature transform is selected via “acoustic sniffing”, which is used to transform the feature vectors of the unknown utterance for decoding. The effectiveness of the proposed method is confirmed in a task adaptation scenario from a conversational telephone speech transcription task to a short message dictation task. The same method is expected to work for personalization as well.
{"title":"A feature-transform based approach to unsupervised task adaptation and personalization","authors":"Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ISCSLP.2012.6423513","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423513","url":null,"abstract":"This paper presents a feature-transform based approach to unsupervised task adaptation and personalization for speech recognition. Given task-specific speech data collected from a deployed service, an “acoustic sniffing” module is built first by using a so-called i-vector technique with a number of acoustic conditions identified via i-vector clustering. Unsupervised maximum likelihood training is then performed to estimate a task-dependent feature transform for each acoustic condition, while pre-trained HMM parameters of acoustic models are kept unchanged. Given an unknown utterance, an appropriate feature transform is selected via “acoustic sniffing”, which is used to transform the feature vectors of the unknown utterance for decoding. The effectiveness of the proposed method is confirmed in a task adaptation scenario from a conversational telephone speech transcription task to a short message dictation task. The same method is expected to work for personalization as well.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423515
Dac-Thang Hoang, Hsiao-Chuan Wang
This paper presents a phone segmentation method without a prior knowledge about the text contents. The proposed method is an unsupervised phone boundary detection based on band-energy tracing technique. It demonstrates a better performance than those previous works when the method was applied to TIMIT corpus. But the performance degrades when the method is applied to a Mandarin Chinese speech database, TCC300 corpus. The evaluation on this Mandarin speech corpus reveals some interesting facts that may cause the difficulty in detecting phone boundaries. We have proposed some ideas that may be helpful in future study for improving the phone segmentation method.
{"title":"A phone segmentation method and its evaluation on Mandarin speech corpus","authors":"Dac-Thang Hoang, Hsiao-Chuan Wang","doi":"10.1109/ISCSLP.2012.6423515","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423515","url":null,"abstract":"This paper presents a phone segmentation method without a prior knowledge about the text contents. The proposed method is an unsupervised phone boundary detection based on band-energy tracing technique. It demonstrates a better performance than those previous works when the method was applied to TIMIT corpus. But the performance degrades when the method is applied to a Mandarin Chinese speech database, TCC300 corpus. The evaluation on this Mandarin speech corpus reveals some interesting facts that may cause the difficulty in detecting phone boundaries. We have proposed some ideas that may be helpful in future study for improving the phone segmentation method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128099363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423457
Xin Wang, Zhenhua Ling, Lirong Dai
In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous F0 HMM (CF-HMM) is proposed in this paper to circumvent voicing decision during the generation of spectral features. Besides, in order to prevent over-fitting problem in model training, regression class is introduced to tie the transform matrices in dependency models. Experiments on proposed methods show both improvement on the accuracy of the generated spectral features and effectiveness of introducing regression class in dependency model training.
{"title":"Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis","authors":"Xin Wang, Zhenhua Ling, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423457","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423457","url":null,"abstract":"In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous F0 HMM (CF-HMM) is proposed in this paper to circumvent voicing decision during the generation of spectral features. Besides, in order to prevent over-fitting problem in model training, regression class is introduced to tie the transform matrices in dependency models. Experiments on proposed methods show both improvement on the accuracy of the generated spectral features and effectiveness of introducing regression class in dependency model training.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128985279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays magnetic resonance imaging (MRI) technique has been widely used in speech production research since it acquires high spatial resolution data of vocal tract shape without any known harm of radiation. However, it would be time consuming and expensive to establish an overall articulatory database using MRI technique due to its low temporal resolution and the large expense of the MRI equipment. In this study, we propose a method to interpolate tongue shapes between static vowels to acquire dynamic tongue shapes. Firstly, a set of parameters is extracted to control tongue shape based on Active Shape Model (ASM). Then, control parameters are interpolated to synthesize dynamic tongue shapes from static vowels' articulation. To evaluate the method, a set of key points were chosen from both the MRI images and the synthesize tongue shapes. Results suggested that the dynamic properties of these key points from the synthesized tongue shapes resemble those of the actual dynamic tongue shapes.
{"title":"Tongue shape synthesis based on Active Shape Model","authors":"Chan Song, Jianguo Wei, Qiang Fang, Shen Liu, Yuguang Wang, J. Dang","doi":"10.1109/ISCSLP.2012.6423537","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423537","url":null,"abstract":"Nowadays magnetic resonance imaging (MRI) technique has been widely used in speech production research since it acquires high spatial resolution data of vocal tract shape without any known harm of radiation. However, it would be time consuming and expensive to establish an overall articulatory database using MRI technique due to its low temporal resolution and the large expense of the MRI equipment. In this study, we propose a method to interpolate tongue shapes between static vowels to acquire dynamic tongue shapes. Firstly, a set of parameters is extracted to control tongue shape based on Active Shape Model (ASM). Then, control parameters are interpolated to synthesize dynamic tongue shapes from static vowels' articulation. To evaluate the method, a set of key points were chosen from both the MRI images and the synthesize tongue shapes. Results suggested that the dynamic properties of these key points from the synthesized tongue shapes resemble those of the actual dynamic tongue shapes.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130907869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423545
Yinghao Li, Jinghua Zhang, Jiangping Kong
The coarticulation resistance (CR) for 21 initial consonants in Standard Chinese was examined in CV monosyllables and symmetrical V1#C2V2 sequences (# stands for morpheme boundary) by analyzing the electropalatographic (EPG) and acoustic signals. The slope for F2 locus equation was compared with that for articulatory regression function, which was calculated by regressing the total linguopalatal contact ratio for vowel target frame against that for consonantal release/approach frame. The results show that the slopes derived from the articulatory regression functions for each consonant was the most appropriate measure to designate the consonant CR. When put together, the CR scale for Standard Chinese was represented by a continuum with an ascending order: labial <; velar <; alveolar <; dental, retroflex, and alveolo-palatal consonants. This consonant CR scale applied not only in the CV monosyllable set but also in V1#C2/#C2V2 transitions in V1#C2V2 sequences. The overall results of the paper support the DAC model in that the coarticulation resistance for consonants was closely dependent on the involvement of tongue dorsum gesture in segment production.
{"title":"The coarticulation resistance of consonants in standard Chinese - An electropalatographic and acoustic study","authors":"Yinghao Li, Jinghua Zhang, Jiangping Kong","doi":"10.1109/ISCSLP.2012.6423545","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423545","url":null,"abstract":"The coarticulation resistance (CR) for 21 initial consonants in Standard Chinese was examined in CV monosyllables and symmetrical V1#C2V2 sequences (# stands for morpheme boundary) by analyzing the electropalatographic (EPG) and acoustic signals. The slope for F2 locus equation was compared with that for articulatory regression function, which was calculated by regressing the total linguopalatal contact ratio for vowel target frame against that for consonantal release/approach frame. The results show that the slopes derived from the articulatory regression functions for each consonant was the most appropriate measure to designate the consonant CR. When put together, the CR scale for Standard Chinese was represented by a continuum with an ascending order: labial <; velar <; alveolar <; dental, retroflex, and alveolo-palatal consonants. This consonant CR scale applied not only in the CV monosyllable set but also in V1#C2/#C2V2 transitions in V1#C2V2 sequences. The overall results of the paper support the DAC model in that the coarticulation resistance for consonants was closely dependent on the involvement of tongue dorsum gesture in segment production.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132766133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acoustic and Articulatory features of Japanese vowels were examined in “Neutral”, “Angry”, and “Sad” speech, using NDI Wave System. The results suggest that (1) Significant differences of the acoustic space, measured by F1 and F2, exist among different emotions. “Angry” is characterized by a horizontally compressed acoustic space, while “Sad” is characterized by a vertically compressed acoustic space. (2) The “front raising” and “retraction and back raising” patterns of the tongue movement mechanism can be enhanced by “Angry” and “Sad” emotion. (3) The lips' dynamically protruding feature is shared by both “Angry” and “Sad”, apart from the exception [A]. We suggested that the exception is caused by the increase of the mouth opening. The mouth opening and the degree of lip protrusion are a pair of complementary features. (4) In articulatory domain, “Angry” is characterized by an increase of mouth opening and a reducing of tongue horizontal movement range.
{"title":"Acoustic and articulatory analysis on Japanese vowels in emotional speech","authors":"Mengxue Cao, Ai-jun Li, Qiang Fang, Jianguo Wei, Chan Song, J. Dang","doi":"10.1109/ISCSLP.2012.6423516","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423516","url":null,"abstract":"Acoustic and Articulatory features of Japanese vowels were examined in “Neutral”, “Angry”, and “Sad” speech, using NDI Wave System. The results suggest that (1) Significant differences of the acoustic space, measured by F1 and F2, exist among different emotions. “Angry” is characterized by a horizontally compressed acoustic space, while “Sad” is characterized by a vertically compressed acoustic space. (2) The “front raising” and “retraction and back raising” patterns of the tongue movement mechanism can be enhanced by “Angry” and “Sad” emotion. (3) The lips' dynamically protruding feature is shared by both “Angry” and “Sad”, apart from the exception [A]. We suggested that the exception is caused by the increase of the mouth opening. The mouth opening and the degree of lip protrusion are a pair of complementary features. (4) In articulatory domain, “Angry” is characterized by an increase of mouth opening and a reducing of tongue horizontal movement range.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131557506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423486
Duy Khanh Ninh, M. Morise, Y. Yamashita
This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.
{"title":"Incorporating dynamic features into minimum generation error training for HMM-based speech synthesis","authors":"Duy Khanh Ninh, M. Morise, Y. Yamashita","doi":"10.1109/ISCSLP.2012.6423486","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423486","url":null,"abstract":"This paper describes new methods of minimum generation error (MGE) training in HMM-based speech synthesis by introducing the error component of dynamic features into the generation error function. We propose two methods for setting the weight associated with the additional error component. In fixed weighting approach, this weight is kept constant over the course of speech. In adaptive weighting approach, it is adjusted according to the degree of dynamic of speech segments. Objective evaluation shows that the newly derived MGE criterion with adaptive weighting method obtains comparable performance on static feature and better performance on delta feature compared to the baseline MGE criterion. Subjective evaluation exhibits an improvement in the quality of synthesized speech with the proposed technique. The newly derived criterion improves the capability of the HMMs in capturing dynamic properties of speech without increasing the computational complexity of training process compared to the baseline criterion.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130071430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423502
Fei Chen, Tian Guan, L. Wong
Noisy listening conditions remain challenging for most cochlear implant patients. The present study simulated the effects of decay rates of excitation spread in cochlear implants on the intelligibility of Mandarin speech in noise. Mandarin sentence and tone stimuli were processed by noise-vocoder, and presented to normal-hearing listeners for identification. The decay rates of excitation spread were simulated by varying the slopes of synthesis filters in noise-vocoder. Experimental results showed that significant benefit for Mandarin sentence recognition in noise was observed with narrower type of excitation. The performance of Mandarin tone identification was relatively robust to the influence of excitation spread. The results in the present study suggest that reducing the decay rates of excitation spread may potentially improve the speech perception in noise for cochlear implants in the future.
{"title":"Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations","authors":"Fei Chen, Tian Guan, L. Wong","doi":"10.1109/ISCSLP.2012.6423502","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423502","url":null,"abstract":"Noisy listening conditions remain challenging for most cochlear implant patients. The present study simulated the effects of decay rates of excitation spread in cochlear implants on the intelligibility of Mandarin speech in noise. Mandarin sentence and tone stimuli were processed by noise-vocoder, and presented to normal-hearing listeners for identification. The decay rates of excitation spread were simulated by varying the slopes of synthesis filters in noise-vocoder. Experimental results showed that significant benefit for Mandarin sentence recognition in noise was observed with narrower type of excitation. The performance of Mandarin tone identification was relatively robust to the influence of excitation spread. The results in the present study suggest that reducing the decay rates of excitation spread may potentially improve the speech perception in noise for cochlear implants in the future.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"173 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}