首页 > 最新文献

5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文 中文
Parametric trajectory mixtures for LVCSR LVCSR的参数轨迹混合
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-685
M. Siu, R. Iyer, H. Gish, Carl Quillen
Parametric trajectory models explicitly represent the temporal evolution of the speech features as a Gaussian process with time-varying parameters. HMMs are a special case of such models, one in which the trajectory constraints in the speech segment are ignored by the assumption of conditional independence across frames within the segment. In this paper, we investigate in detail some extensions to our trajectory modeling approach aimed at improving LVCSR performance: (i) improved modeling of mixtures of trajectories via better initialization, (ii) modeling of context dependence, and (iii) improved segment boundaries by means of search. We will present results in terms of both phone classi cation and recognition accuracy on the Switchboard corpus.
参数轨迹模型明确地将语音特征的时间演化表示为具有时变参数的高斯过程。hmm是这种模型的一种特殊情况,在这种模型中,语音片段中的轨迹约束被忽略,因为它假设了语音片段内帧之间的条件独立。在本文中,我们详细研究了我们的轨迹建模方法的一些扩展,旨在提高LVCSR的性能:(i)通过更好的初始化改进轨迹混合的建模,(ii)通过上下文依赖性建模,以及(iii)通过搜索改进段边界。我们将在总机语料库上展示电话分类和识别准确性方面的结果。
{"title":"Parametric trajectory mixtures for LVCSR","authors":"M. Siu, R. Iyer, H. Gish, Carl Quillen","doi":"10.21437/ICSLP.1998-685","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-685","url":null,"abstract":"Parametric trajectory models explicitly represent the temporal evolution of the speech features as a Gaussian process with time-varying parameters. HMMs are a special case of such models, one in which the trajectory constraints in the speech segment are ignored by the assumption of conditional independence across frames within the segment. In this paper, we investigate in detail some extensions to our trajectory modeling approach aimed at improving LVCSR performance: (i) improved modeling of mixtures of trajectories via better initialization, (ii) modeling of context dependence, and (iii) improved segment boundaries by means of search. We will present results in terms of both phone classi cation and recognition accuracy on the Switchboard corpus.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130186963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Speech communication profiles across the adult lifespan: persons without self-identified hearing impairment 成年期的语言交流概况:没有自我认定的听力障碍的人
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-790
M. Cheesman, K. Smilsky, T. Major, F. Lewis, L. Boorman
A sample of 209 adults ranging from 20 to 79 years of age were studied to measure speech communication profiles as a function of age in persons who did not identify themselves as hearing impaired. The study was conducted in order to evaluate age-related speech percepton abilities and ccammmication profiles in a population who do not present for hearing assessment and who are not included in census statistics as having hearing problems. Audiometric assessment, demographic and hearing history self-reports, speech reception thresholds, consonant discrimination perception in quiet and noise, and the Ccumnunication Profile for the Hearing Impaired (CPHI) were the in.ements used to develop speech communication profiles. Hearing performance decreased with increased age. However, despite self-reports of no hearing impairment, many subjects over age 50 had audiometric thresholds that indicated hearing impairment. The responses to the CPHI were correlated to audiometric thresholds, but also to the age of the respondent, when hearing thresholds had been controlled statistically. A comparison of CPHI responses f?om this study and that of two other samples in clinical populations revealed only slightly different patterns of behaviour in the present sample when co&o&d with communication difficulties.
研究人员选取了209名年龄在20岁到79岁之间的成年人作为样本,以测量那些不认为自己有听力障碍的人的语言交流情况与年龄的关系。进行这项研究的目的是评估与年龄相关的语言感知能力和交流情况,这些人没有参加听力评估,也没有被列入人口普查统计,因为他们有听力问题。听力测量评估、人口统计学和听力史自我报告、语音接收阈值、安静和噪音下的辅音辨别感知以及听力受损者的交流概况(CPHI)是其中的一部分。用于开发语音通信配置文件的元素。听力随年龄增长而下降。然而,尽管自我报告没有听力障碍,但许多50岁以上的受试者都有听力障碍的听力阈值。当听力阈值得到统计控制时,CPHI的反应与听力阈值相关,但也与应答者的年龄相关。CPHI反应的比较在这项研究中,临床人群中的另外两个样本显示,当与沟通困难共处时,本样本的行为模式只有轻微的不同。
{"title":"Speech communication profiles across the adult lifespan: persons without self-identified hearing impairment","authors":"M. Cheesman, K. Smilsky, T. Major, F. Lewis, L. Boorman","doi":"10.21437/ICSLP.1998-790","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-790","url":null,"abstract":"A sample of 209 adults ranging from 20 to 79 years of age were studied to measure speech communication profiles as a function of age in persons who did not identify themselves as hearing impaired. The study was conducted in order to evaluate age-related speech percepton abilities and ccammmication profiles in a population who do not present for hearing assessment and who are not included in census statistics as having hearing problems. Audiometric assessment, demographic and hearing history self-reports, speech reception thresholds, consonant discrimination perception in quiet and noise, and the Ccumnunication Profile for the Hearing Impaired (CPHI) were the in.ements used to develop speech communication profiles. Hearing performance decreased with increased age. However, despite self-reports of no hearing impairment, many subjects over age 50 had audiometric thresholds that indicated hearing impairment. The responses to the CPHI were correlated to audiometric thresholds, but also to the age of the respondent, when hearing thresholds had been controlled statistically. A comparison of CPHI responses f?om this study and that of two other samples in clinical populations revealed only slightly different patterns of behaviour in the present sample when co&o&d with communication difficulties.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133893507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A minimax search algorithm for CDHMM based robust continuous speech recognition 基于CDHMM的鲁棒连续语音识别的极大极小搜索算法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-304
Hui Jiang, K. Hirose, Qiang Huo
In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of its intrinsic nature of a recursive search, the proposed method can be easily extended to perform contin-uos speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and e(cid:14)ciency of the proposed minimax search algorithm.
本文提出了一种基于连续密度隐马尔可夫模型的鲁棒语音识别的极大极小决策规则的实现方法。将极大极小决策规则的思想与一般的Viterbi搜索相结合,导出了一种递归极大极小搜索算法,该算法在搜索过程中反复应用极大极小决策规则来确定部分路径。由于其固有的递归搜索特性,所提出的方法可以很容易地扩展到执行连续语音识别。在日文孤立数字和TIDIGITS上的实验结果表明,训练条件与测试条件不匹配是由加性高斯白噪声引起的,实验结果表明了极小极大搜索算法的可行性和e(cid:14)的效率。
{"title":"A minimax search algorithm for CDHMM based robust continuous speech recognition","authors":"Hui Jiang, K. Hirose, Qiang Huo","doi":"10.21437/ICSLP.1998-304","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-304","url":null,"abstract":"In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of its intrinsic nature of a recursive search, the proposed method can be easily extended to perform contin-uos speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and e(cid:14)ciency of the proposed minimax search algorithm.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134135137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Nozomi - a fast, memory-efficient stack decoder for LVCSR Nozomi -一个快速,内存高效的LVCSR堆栈解码器
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-627
M. Schuster
This paper describes some of the implementation details of the Nozomi" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.
本文介绍了用于LVCSR的Nozomi" 1堆栈解码器的一些实现细节。该解码器在一个使用5000个单词的日语报纸听写任务中进行了测试。使用IPA组[7]提供的在JNAS/ASJ语料库上训练的2000和3000状态的连续密度声学模型和在RWC文本语料库上训练的3克LM模型,在标准测试集上可以达到95%以上的单词准确率。使用计算成本低廉的声学模型,我们可以在300 Mhz的Pentium II上几乎实时地达到89%的精度。使用基于磁盘的LM,内存使用可以优化到总共4 MB。
{"title":"Nozomi - a fast, memory-efficient stack decoder for LVCSR","authors":"M. Schuster","doi":"10.21437/ICSLP.1998-627","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-627","url":null,"abstract":"This paper describes some of the implementation details of the Nozomi\" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133904476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evidence of dual-route phonetic encoding from apraxia of speech: implications for phonetic encoding models 语音失用双路径语音编码的证据:语音编码模型的意义
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-789
R. Varley, S. Whiteside
Contemporary psycholinguistic models suggest that there may be dual routes operating in phonetic encoding: a direct route which uses stored syllabic units, and an indirect route which relies on the on-line assembly of sub-syllabic units. The more computationally efficient direct route is more likely to be used for high frequency words, while the indirect route is most likely to be used for novel or low frequency words. We suggest that the acquired neurological disorder of apraxia of speech (AOS), provides a window to speech encoding mechanisms and that the disorder represents an impairment of direct route encoding mechanisms and, therefore, a reliance on indirect mechanisms. We report an investigation of the production of high and low frequency words across three subject groups: non-brain damaged control (NBDC, N=3); brain damaged control (BDC, N=3) and speakers with AOS (N=4). The results are presented and discussed within the dual-route phonetic encoding hypothesis.
当代心理语言学模型表明,语音编码可能有双重途径:直接途径是使用已存储的音节单位,间接途径是依赖于在线组装的次音节单位。计算效率更高的直接路径更有可能用于高频词,而间接路径更有可能用于新颖或低频词。我们认为,获得性神经障碍语言失用症(AOS)为语言编码机制提供了一个窗口,该障碍代表了直接途径编码机制的损伤,因此依赖于间接机制。我们报告了一项关于三组受试者高频词和低频词产生的调查:非脑损伤对照组(NBDC, N=3);脑损伤对照组(BDC, N=3)和AOS组(N=4)。本文在双路语音编码假说的框架下对结果进行了介绍和讨论。
{"title":"Evidence of dual-route phonetic encoding from apraxia of speech: implications for phonetic encoding models","authors":"R. Varley, S. Whiteside","doi":"10.21437/ICSLP.1998-789","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-789","url":null,"abstract":"Contemporary psycholinguistic models suggest that there may be dual routes operating in phonetic encoding: a direct route which uses stored syllabic units, and an indirect route which relies on the on-line assembly of sub-syllabic units. The more computationally efficient direct route is more likely to be used for high frequency words, while the indirect route is most likely to be used for novel or low frequency words. We suggest that the acquired neurological disorder of apraxia of speech (AOS), provides a window to speech encoding mechanisms and that the disorder represents an impairment of direct route encoding mechanisms and, therefore, a reliance on indirect mechanisms. We report an investigation of the production of high and low frequency words across three subject groups: non-brain damaged control (NBDC, N=3); brain damaged control (BDC, N=3) and speakers with AOS (N=4). The results are presented and discussed within the dual-route phonetic encoding hypothesis.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133996680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue 目标导向对话语料库中轮换时间的分析
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-81
Matthew Bull, M. Aylett
This paper presents a context-based analysis of the intervals between different speakers’ utterances in a corpus of taskoriented dialogue (the Human Communication Research Centre’s Map Task Corpus. See Anderson et al. 1991). In the analysis, we assessed the relationship between inter-speaker intervals and various contextual factors, such as the effects of eye contact, the presence of conversational game boundaries, the category of move in an utterance, and the degree of experience with the task in hand. The results of the analysis indicated that the main factors which gave rise to significant differences in inter-speaker intervals were those which related to decision-making and planning the greater the amount of planning, the greater the inter-speaker interval. Differences between speakers were also found to be significant, although this effect did not necessarily interact with all other effects. These results provide unique and useful data for the improved effectiveness of dialogue systems.
本文对任务型对话语料库(人类交际研究中心地图任务语料库)中不同说话人的话语间隔进行了基于语境的分析。参见Anderson et al. 1991)。在分析中,我们评估了说话人之间的间隔与各种上下文因素之间的关系,例如眼神接触的影响、会话游戏边界的存在、话语中的移动类别以及手头任务的经验程度。分析结果表明,引起语间间隔显著差异的主要因素是与决策和计划相关的因素,计划量越大,语间间隔越大。说话者之间的差异也被发现是显著的,尽管这种影响不一定与所有其他影响相互作用。这些结果为提高对话系统的有效性提供了独特而有用的数据。
{"title":"An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue","authors":"Matthew Bull, M. Aylett","doi":"10.21437/ICSLP.1998-81","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-81","url":null,"abstract":"This paper presents a context-based analysis of the intervals between different speakers’ utterances in a corpus of taskoriented dialogue (the Human Communication Research Centre’s Map Task Corpus. See Anderson et al. 1991). In the analysis, we assessed the relationship between inter-speaker intervals and various contextual factors, such as the effects of eye contact, the presence of conversational game boundaries, the category of move in an utterance, and the degree of experience with the task in hand. The results of the analysis indicated that the main factors which gave rise to significant differences in inter-speaker intervals were those which related to decision-making and planning the greater the amount of planning, the greater the inter-speaker interval. Differences between speakers were also found to be significant, although this effect did not necessarily interact with all other effects. These results provide unique and useful data for the improved effectiveness of dialogue systems.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients 应用基于时间相关的递归多层神经网络对基于mel的倒谱系数进行鲁棒自动语音识别
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-328
M. Héon, H. Tolba, D. O'Shaughnessy
In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have been obtained at a moderate SNR of 20 dB.
本文研究了鲁棒性语音识别问题。我们的方法是基于我们用于识别的参数的降噪,即基于mel的倒谱系数。基于时间相关的递归多层神经网络(TCRMNN)用于倒谱域降噪,以获得较少变化的参数,从而有助于在噪声环境下进行鲁棒识别。实验表明,使用该方法增强的参数提高了连续语音识别(CSR)过程的识别率。HTK隐马尔可夫模型工具包在整个过程中使用。实验是在有噪声版本的TIMIT数据库上进行的。在基于ht的连续语音识别系统(CSR)的前端采用这种预处理降噪技术,在中等信噪比为20 dB的情况下,使用单个混合单声道和三声道的识别准确率分别提高了约17.77%和18.58%。
{"title":"Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients","authors":"M. Héon, H. Tolba, D. O'Shaughnessy","doi":"10.21437/ICSLP.1998-328","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-328","url":null,"abstract":"In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have been obtained at a moderate SNR of 20 dB.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving speaker recognisability in phonetic vocoders 提高语音声码器中说话人的可识别性
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-396
C. Ribeiro, I. Trancoso
Phonetic vocoding is one of the methods for coding speech below 1000 bit/s. The transmitter stage includes a phone recogniser whose index is transmitted together with prosodic information such as duration, energy and pitch variation. This type of coder does not transmit spectral speaker characteristics and speaker recognisability thus becomes a major problem. In our previous work, we adapted a speaker modification strategy to minimise this problem, modifying a codebook to match the spectral characteristics of the input speaker. This is done at the cost of transmitting the LSP averages computed for vowel and glide phones. This paper presents new codebook generation strategies, with gender dependence and interpolation frames, that lead to better speaker recognisability and speech quality. Relatively to our previous work, some effort was also devoted to deriving more efficient quantization methods for the speakerspecific information, that considerably reduced the average bit rate, without quality degradation. For the CD-ROM version, a set of audio files is also included.
语音编码是对低于1000比特/秒的语音进行编码的方法之一。发射器阶段包括电话识别器,其索引与诸如持续时间、能量和音高变化等韵律信息一起传输。这种类型的编码器不传输频谱说话人的特征和说话人的识别,因此成为一个主要问题。在我们之前的工作中,我们采用了一种扬声器修改策略来最小化这个问题,修改码本以匹配输入扬声器的频谱特性。这是以传输为元音和滑音电话计算的LSP平均值为代价的。本文提出了新的码本生成策略,使用性别依赖和插值帧,从而提高说话人的可识别性和语音质量。相对于我们以前的工作,我们还致力于为讲话者特定信息推导更有效的量化方法,这大大降低了平均比特率,而不会降低质量。对于CD-ROM版本,还包括一组音频文件。
{"title":"Improving speaker recognisability in phonetic vocoders","authors":"C. Ribeiro, I. Trancoso","doi":"10.21437/ICSLP.1998-396","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-396","url":null,"abstract":"Phonetic vocoding is one of the methods for coding speech below 1000 bit/s. The transmitter stage includes a phone recogniser whose index is transmitted together with prosodic information such as duration, energy and pitch variation. This type of coder does not transmit spectral speaker characteristics and speaker recognisability thus becomes a major problem. In our previous work, we adapted a speaker modification strategy to minimise this problem, modifying a codebook to match the spectral characteristics of the input speaker. This is done at the cost of transmitting the LSP averages computed for vowel and glide phones. This paper presents new codebook generation strategies, with gender dependence and interpolation frames, that lead to better speaker recognisability and speech quality. Relatively to our previous work, some effort was also devoted to deriving more efficient quantization methods for the speakerspecific information, that considerably reduced the average bit rate, without quality degradation. For the CD-ROM version, a set of audio files is also included.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Prosodic vs. segmental contributions to naturalness in a diphone synthesizer 音韵与音段对双管合成器自然性的贡献
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-15
H. Bunnell, S. Hoskins, Debra Yarrington
The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as “natural” or “synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.
通过在自然语音和双管合成器的输出之间移植韵律,测量了片段和韵律因素对合成语音感知自然度的相对贡献。创建了一个小语料库,其中包含匹配的句子对,其中一个成员是自然话语,另一个是由同一说话者的diphone数据生成的合成话语。通过将每对句子的自然句和合成句之间的韵律结构进行移植,形成两个附加句。在两个听力实验中,受试者被要求(a)将每个句子分类为“自然”或“合成”,或(b)对每个句子的自然程度进行评分。结果表明,韵律信息在自然度的分类和评分中都比片段信息更重要。
{"title":"Prosodic vs. segmental contributions to naturalness in a diphone synthesizer","authors":"H. Bunnell, S. Hoskins, Debra Yarrington","doi":"10.21437/ICSLP.1998-15","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-15","url":null,"abstract":"The relative contributions of segmental versus prosodic factors to the perceived naturalness of synthetic speech was measured by transplanting prosody between natural speech and the output of a diphone synthesizer. A small corpus was created containing matched sentence pairs wherein one member of the pair was a natural utterance and the other was a synthetic utterance generated with diphone data from the same talker. Two additional sentences were formed from each sentence pair by transplanting the prosodic structure between the natural and synthetic members of each pair. In two listening experiments subjects were asked to (a) classify each sentence as “natural” or “synthetic, or (b) rate the naturalness of each sentence. Results showed that the prosodic information was more important than segmental information in both classification and ratings of naturalness.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132637230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Perceptual properties of Russians with Japanese fricatives 俄语日语摩擦音的感知特性
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-719
S. Funatsu, S. Kiritani
This study investigated the perceptual properties of second language learners in acquiring second language phonemes. The case where the relation between two phonemes of a second language and those of a native language changes according to following vowels was studied. The perceptual properties of Russians with regards to Japanese fricatives were examined. In the perception test, the confusion of [ (cid:219) o] with [so] was very large. This phenomenon could be caused by the difference between the transition onset time from [s’] to vowels and that from the other consonants to vowels. It is considered that, in the case of following vowel [a] and [o], Russians equated Japanese [s] and [ (cid:219) ] with Russian [s] and [s'] respectively. However, in the case of [u], they did not equate them in such a manner. This is probably because the acoustic properties of Japanese [ (cid:149) ] are very different from those of Russian [u].
本研究探讨了二语学习者在习得二语音素过程中的感知特性。研究了第二语言的两个音素与母语的两个音素之间的关系根据后面的元音而变化的情况。研究了俄罗斯人对日语摩擦音的感知特性。在感知测试中,[(cid:219) o]与[so]的混淆程度非常大。这种现象可能是由[s ']到元音的过渡起始时间与其他辅音到元音的过渡起始时间不同造成的。据认为,在元音[a]和[o]后面的情况下,俄罗斯人将日语[s]和[(cid:219)]分别等同于俄语[s]和[s']。然而,在[u]的情况下,他们并没有以这种方式将它们等同起来。这可能是因为日语[(cid:149)]的声学特性与俄语[u]有很大的不同。
{"title":"Perceptual properties of Russians with Japanese fricatives","authors":"S. Funatsu, S. Kiritani","doi":"10.21437/ICSLP.1998-719","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-719","url":null,"abstract":"This study investigated the perceptual properties of second language learners in acquiring second language phonemes. The case where the relation between two phonemes of a second language and those of a native language changes according to following vowels was studied. The perceptual properties of Russians with regards to Japanese fricatives were examined. In the perception test, the confusion of [ (cid:219) o] with [so] was very large. This phenomenon could be caused by the difference between the transition onset time from [s’] to vowels and that from the other consonants to vowels. It is considered that, in the case of following vowel [a] and [o], Russians equated Japanese [s] and [ (cid:219) ] with Russian [s] and [s'] respectively. However, in the case of [u], they did not equate them in such a manner. This is probably because the acoustic properties of Japanese [ (cid:149) ] are very different from those of Russian [u].","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"17 3 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
5th International Conference on Spoken Language Processing (ICSLP 1998)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1