首页 > 最新文献

2008 IEEE Spoken Language Technology Workshop最新文献

英文 中文
Speech-to-text input method for web system using JavaScript 语音转文本输入法的web系统,使用JavaScript
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777877
R. Nisimura, Jumpei Miyake, Hideki Kawahara, T. Irino
We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.
我们为网络系统开发了一种语音到文本的输入法。该系统以JavaScript库的形式提供,包括基于Java applet的类ajax机制、CGI程序和动态HTML文档。它允许用户在不需要特殊浏览器的情况下访问语音网页。Web开发人员只需在HTML文档的标题字段中插入一行,就可以将其嵌入到网页中。本研究还旨在观察个人环境中的自然言语互动。在7个月的时间里,我们通过日本公共ASR服务器成功收集了4003个输入。为了覆盖词汇外的单词,以应对一些专有名词,开发了一个网页,将新单词注册到语言模型中。结果表明,我们的识别准确率提高了0.8%。声学条件下,信噪比为25.3 dB。
{"title":"Speech-to-text input method for web system using JavaScript","authors":"R. Nisimura, Jumpei Miyake, Hideki Kawahara, T. Irino","doi":"10.1109/SLT.2008.4777877","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777877","url":null,"abstract":"We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131683310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Accented Indian english ASR: Some early results 印度口音英语ASR:一些早期结果
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777881
Kaustubh Kulkarni, Sailik Sengupta, V. Ramasubramanian, Josef G. Bauer, G. Stemmer
The problem of the effect of accent on the performance of Automatic Speech Recognition (ASR) systems is well known. In this paper, we study the effect of accent variability on the performance of the Indian English ASR task. We evaluate the test vocabularies on HMMs trained on (a) Accent specific training data (b) Accent pooled training data which combines all the accent specific training data (c) Accent pooled training data of reduced size matching the size of the accent specific training data. We demonstrate that the accent pooled training set performs the best on phonetically rich isolated word recognition task. But the accent specific HMMs perform better than the reduced accent pooled HMMs, indicating a possible approach of using a first stage accent identification to choose the correct accent trained HMMs for further recognition.
重音对自动语音识别(ASR)系统性能影响的问题是众所周知的。在本文中,我们研究口音变异对印度英语ASR任务表现的影响。我们在(a)特定口音训练数据(b)结合所有特定口音训练数据的口音池训练数据(c)与特定口音训练数据大小匹配的缩小大小的口音池训练数据上训练hmm上评估测试词汇。结果表明,重音池训练集在语音丰富的孤立词识别任务中表现最好。但是,针对特定口音的hmm比简化后的混合口音hmm表现得更好,这表明了一种可能的方法,即使用第一阶段的口音识别来选择正确的经过训练的口音hmm进行进一步识别。
{"title":"Accented Indian english ASR: Some early results","authors":"Kaustubh Kulkarni, Sailik Sengupta, V. Ramasubramanian, Josef G. Bauer, G. Stemmer","doi":"10.1109/SLT.2008.4777881","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777881","url":null,"abstract":"The problem of the effect of accent on the performance of Automatic Speech Recognition (ASR) systems is well known. In this paper, we study the effect of accent variability on the performance of the Indian English ASR task. We evaluate the test vocabularies on HMMs trained on (a) Accent specific training data (b) Accent pooled training data which combines all the accent specific training data (c) Accent pooled training data of reduced size matching the size of the accent specific training data. We demonstrate that the accent pooled training set performs the best on phonetically rich isolated word recognition task. But the accent specific HMMs perform better than the reduced accent pooled HMMs, indicating a possible approach of using a first stage accent identification to choose the correct accent trained HMMs for further recognition.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Evaluation of a spoken dialogue system for controlling a Hifi audio system 用于控制高保真音频系统的口语对话系统的评价
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777859
F. Fernández-Martínez, Juan Blázquez, J. Ferreiros, R. Barra-Chicote, Javier Macias-Guarasa, J. M. Lucas
In this paper, a Bayesian networks, BNs, approach to dialogue modelling is evaluated in terms of a battery of both subjective and objective metrics. A significant effort in improving the contextual information handling capabilities of the system has been done. Consequently, besides typical dialogue measurement rates for usability like task or dialogue completion rates, dialogue time, etc. we have included a new figure measuring the contextuality of the dialogue as the number of turns where contextual information is helpful for dialogue resolution. The evaluation is developed through a set of predefined scenarios according to different initiative styles and focusing on the impact of the user's level of experience.
在本文中,贝叶斯网络(BNs)在一系列主观和客观指标方面对对话建模方法进行了评估。在改进系统的上下文信息处理能力方面已经做了大量的工作。因此,除了典型的可用性对话测量率(如任务或对话完成率,对话时间等)外,我们还包含了一个衡量对话情境性的新数字,即上下文信息有助于对话解决的回合数。评估是根据不同的计划风格通过一组预定义的场景来开发的,并关注用户体验水平的影响。
{"title":"Evaluation of a spoken dialogue system for controlling a Hifi audio system","authors":"F. Fernández-Martínez, Juan Blázquez, J. Ferreiros, R. Barra-Chicote, Javier Macias-Guarasa, J. M. Lucas","doi":"10.1109/SLT.2008.4777859","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777859","url":null,"abstract":"In this paper, a Bayesian networks, BNs, approach to dialogue modelling is evaluated in terms of a battery of both subjective and objective metrics. A significant effort in improving the contextual information handling capabilities of the system has been done. Consequently, besides typical dialogue measurement rates for usability like task or dialogue completion rates, dialogue time, etc. we have included a new figure measuring the contextuality of the dialogue as the number of turns where contextual information is helpful for dialogue resolution. The evaluation is developed through a set of predefined scenarios according to different initiative styles and focusing on the impact of the user's level of experience.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132025327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A syntactic language model based on incremental CCG parsing 基于增量CCG解析的句法语言模型
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777876
Hany Hassan, K. Sima'an, Andy Way
Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCG-bank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy.
语法丰富的语言模型(解析器)在机器翻译和语音识别等应用中是一个很有前途的组成部分。为了保持有用的精确度,现有的解析器是非增量的,并且在处理每个输入单词时必须跨越可能结构的组合增长空间。这就禁止将它们合并到标准线性时间解码器中。本文提出了一种基于组合范畴语法(CCG)和分类技术的增量式线性时间依赖解析器。我们设计了CCG-bank正则推导到增量推导的确定性转换,并在此数据上训练我们的解析器。我们发现级联的增量版本在效率和准确性之间提供了一个吸引人的平衡。
{"title":"A syntactic language model based on incremental CCG parsing","authors":"Hany Hassan, K. Sima'an, Andy Way","doi":"10.1109/SLT.2008.4777876","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777876","url":null,"abstract":"Syntactically-enriched language models (parsers) constitute a promising component in applications such as machine translation and speech-recognition. To maintain a useful level of accuracy, existing parsers are non-incremental and must span a combinatorially growing space of possible structures as every input word is processed. This prohibits their incorporation into standard linear-time decoders. In this paper, we present an incremental, linear-time dependency parser based on Combinatory Categorial Grammar (CCG) and classification techniques. We devise a deterministic transform of CCG-bank canonical derivations into incremental ones, and train our parser on this data. We discover that a cascaded, incremental version provides an appealing balance between efficiency and accuracy.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114517475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Class-based named entity translation in a speech to speech translation system 语音到语音翻译系统中基于类的命名实体翻译
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777888
S. Maskey, Martin Cmejrek, Bowen Zhou, Yuqing Gao
Named entity (NE) translation is a challenging problem in machine translation (MT). Most of the training bi-text corpora for MT lack enough samples of NEs to cover the wide variety of contexts NEs can appear in. In this paper, we present a technique to translate NEs based on their NE types in addition to a phrase-based translation model. Our NE translation model is based on a syntax-based system similar to the work of Chiang (2005); but we produce syntax-based rules with non-terminals as NE types instead of general non-terminals. Such class-based rules allow us to better generalize the context NEs. We show that our proposed method obtains an improvement of 0.66 BLEU score absolute as well as 0.26% in F1-measure over the baseline of phrase-based model in NE test set.
命名实体(NE)翻译是机器翻译中一个具有挑战性的问题。大多数用于机器翻译的训练双文语料库缺乏足够的网元样本来覆盖网元可能出现的各种上下文。在本文中,除了基于短语的翻译模型外,我们还提出了一种基于网元类型的翻译技术。我们的NE翻译模型是基于一个类似于Chiang(2005)的基于语法的系统;但是我们生成基于语法的规则,将非终结符作为网元类型,而不是一般的非终结符。这种基于类的规则使我们能够更好地概括上下文网元。结果表明,本文提出的方法在NE测试集中比基于短语的模型的基线提高了0.66 BLEU绝对分数和0.26%的f1测度。
{"title":"Class-based named entity translation in a speech to speech translation system","authors":"S. Maskey, Martin Cmejrek, Bowen Zhou, Yuqing Gao","doi":"10.1109/SLT.2008.4777888","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777888","url":null,"abstract":"Named entity (NE) translation is a challenging problem in machine translation (MT). Most of the training bi-text corpora for MT lack enough samples of NEs to cover the wide variety of contexts NEs can appear in. In this paper, we present a technique to translate NEs based on their NE types in addition to a phrase-based translation model. Our NE translation model is based on a syntax-based system similar to the work of Chiang (2005); but we produce syntax-based rules with non-terminals as NE types instead of general non-terminals. Such class-based rules allow us to better generalize the context NEs. We show that our proposed method obtains an improvement of 0.66 BLEU score absolute as well as 0.26% in F1-measure over the baseline of phrase-based model in NE test set.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123634333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using prior knowledge to assess relevance in speech summarization 运用先验知识评估语音摘要的相关性
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777867
Ricardo Ribeiro, David Martins de Matos
We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source. The similarity of the output summaries of the several approaches is also analyzed.
我们探索了基于主题的自动获取先验知识在语音摘要中的使用,评估了其在几种术语加权方案中的影响。所有信息以潜在语义分析为核心程序组合,计算给定输入源的类句子单元的相关性。评估是使用自信息度量来执行的,它试图捕获与汇总输入源相关的摘要的信息量。分析了几种方法输出摘要的相似度。
{"title":"Using prior knowledge to assess relevance in speech summarization","authors":"Ricardo Ribeiro, David Martins de Matos","doi":"10.1109/SLT.2008.4777867","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777867","url":null,"abstract":"We explore the use of topic-based automatically acquired prior knowledge in speech summarization, assessing its influence throughout several term weighting schemes. All information is combined using latent semantic analysis as a core procedure to compute the relevance of the sentence-like units of the given input source. Evaluation is performed using the self-information measure, which tries to capture the informativeness of the summary in relation to the summarized input source. The similarity of the output summaries of the several approaches is also analyzed.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124939076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Methods for improving the quality of syllable based speech synthesis 提高基于音节的语音合成质量的方法
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777832
Y. R. Venugopalakrishna, M. V. Vinodh, H. Murthy, C. S. Ramalingam
Our earlier work [1] on speech synthesis has shown that syllables can produce reasonably natural quality speech. Nevertheless, audible artifacts are present due to discontinuities in pitch, energy, and formant trajectories at the joining point of the units. In this paper, we present some minimal signal modification techniques for reducing these artifacts.
我们在语音合成方面的早期工作b[1]表明,音节可以产生相当自然的语音质量。然而,由于单元连接点的音高、能量和形成峰轨迹的不连续性,存在可听伪影。在本文中,我们提出了一些最小的信号修改技术来减少这些伪影。
{"title":"Methods for improving the quality of syllable based speech synthesis","authors":"Y. R. Venugopalakrishna, M. V. Vinodh, H. Murthy, C. S. Ramalingam","doi":"10.1109/SLT.2008.4777832","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777832","url":null,"abstract":"Our earlier work [1] on speech synthesis has shown that syllables can produce reasonably natural quality speech. Nevertheless, audible artifacts are present due to discontinuities in pitch, energy, and formant trajectories at the joining point of the units. In this paper, we present some minimal signal modification techniques for reducing these artifacts.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127221713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Adaptive filtering for high quality hmm based speech synthesis 基于hmm的高质量语音合成自适应滤波
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777835
L. Coelho, D. Braga
In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.
本文提出了一种基于双离散卡尔曼滤波(DKF)的自适应滤波方案,用于基于隐马尔可夫模型(HMM)的语音合成质量增强。目标是提高hmm及其相关状态之间的信号平滑度,并减少由于声学模型的限制而产生的伪影。语音和工件都通过自回归结构建模,该结构提供了潜在的时间框架依赖性并提高了时频分辨率。对模型参数进行排序,得到一个组合状态空间模型,并用于计算瞬时功率谱密度估计。质量增强是通过双重离散卡尔曼滤波器来实现的,该滤波器同时对模型和信号进行估计。使用平均意见得分测试对系统的性能进行了评估,所提出的技术导致了改进的结果。
{"title":"Adaptive filtering for high quality hmm based speech synthesis","authors":"L. Coelho, D. Braga","doi":"10.1109/SLT.2008.4777835","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777835","url":null,"abstract":"In this work an adaptive filtering scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for Hidden Markov Model (HMM) based speech synthesis quality enhancement. The objective is to improve signal smoothness across HMMs and their related states and to reduce artifacts due to acoustic model's limitations. Both speech and artifacts are modelled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. Themodel parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The quality enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. The system's performance has been evaluated using mean opinion score tests and the proposed technique has led to improved results.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127482537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving word segmentation for Thai speech translation 改进泰语语音翻译的分词方法
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777885
Paisarn Charoenpornsawat, Tanja Schultz
A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, maximal matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42% BLEU points compared to the baseline system.
词汇表和语言模型是语音翻译系统的主要组成部分。对于英语来说,从纯文本生成两者是一项简单的任务。然而,对于不提供分词的中文、日文或泰文来说,这是相当具有挑战性的,即文本没有词边界分隔符。对于泰语分词,基于词典的最大匹配方法是常用的分词方法之一。然而,这种方法很大程度上依赖于词典的覆盖范围。当文本中包含未知单词时,这种方法通常会产生错误的边界。在提取分词时,由于分词错误,导致部分词无法检索到。在本文中,我们提出了统计技术来解决这个问题。基于不同的分词方法,我们开发了不同的语音翻译系统,结果表明,与基线系统相比,本文提出的方法可以显著提高翻译精度约6.42% BLEU点。
{"title":"Improving word segmentation for Thai speech translation","authors":"Paisarn Charoenpornsawat, Tanja Schultz","doi":"10.1109/SLT.2008.4777885","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777885","url":null,"abstract":"A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, maximal matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42% BLEU points compared to the baseline system.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114588774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Corpus-based synthesis of Mandarin speech with F0 contours generated by superposing tone components on rule-generated phrase components 基于语料库的基于规则生成的短语成分叠加音调成分生成F0轮廓的汉语语音合成
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777833
K. Hirose, Qinghua Sun, N. Minematsu
Mandarin speech synthesis was conducted by generating prosodic features by the proposed method and segmental features by HMM-based method. The proposed method generates sentence fundamental frequency (F0) contours by representing them as a superposition of tone components on phrase components. The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based method, while the phrase components are generated by rules under the generation process model (F0 model) framework. The method includes prediction of phoneme/pause durations in a statistical method as the first step. Through a listening test on the quality of synthetic speech, it was shown that a better quality was obtainable by the method as compared to that by the full HMM-based method. It was also shown that a better quality is obtainable as compared to the case of generating F0 contours without super-positional scheme.
利用该方法生成韵律特征,利用基于hmm的方法生成分词特征,实现普通话语音合成。该方法将句子基频(F0)轮廓表示为声调成分在短语成分上的叠加,从而生成句子基频(F0)轮廓。语调成分通过基于语料库的方法将其片段连接在语调核上实现,而短语成分在生成过程模型(F0模型)框架下根据规则生成。该方法包括以统计方法预测音素/停顿时间作为第一步。通过对合成语音质量的听力测试表明,与完全基于hmm的方法相比,该方法可以获得更好的合成语音质量。实验还表明,与不使用叠加方案生成F0轮廓相比,该方法可以获得更好的质量。
{"title":"Corpus-based synthesis of Mandarin speech with F0 contours generated by superposing tone components on rule-generated phrase components","authors":"K. Hirose, Qinghua Sun, N. Minematsu","doi":"10.1109/SLT.2008.4777833","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777833","url":null,"abstract":"Mandarin speech synthesis was conducted by generating prosodic features by the proposed method and segmental features by HMM-based method. The proposed method generates sentence fundamental frequency (F0) contours by representing them as a superposition of tone components on phrase components. The tone components are realized by concatenating their fragments at tone nuclei predicted by a corpus-based method, while the phrase components are generated by rules under the generation process model (F0 model) framework. The method includes prediction of phoneme/pause durations in a statistical method as the first step. Through a listening test on the quality of synthetic speech, it was shown that a better quality was obtainable by the method as compared to that by the full HMM-based method. It was also shown that a better quality is obtainable as compared to the case of generating F0 contours without super-positional scheme.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128849050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2008 IEEE Spoken Language Technology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1