首页 > 最新文献

5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文 中文
Phonetic modification of the syllable /tu/ in two spontaneous american English dialogues 两个美式英语自发对话中/tu/音节的语音变化
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-673
N. Veilleux, S. Shattuck-Hufnagel
In a pilot study of phonetic modi cation of function words in 2 spontaneous speech dialogues, 99 utterances of the syllable /tu/ corresponding to to, two, too, -to and toincluded the pronunciation variants [t h u, t h , D , d , n , s , s, t h , , t h ]. Factors in uencing phonetic modi cation included phonetic context, prosody, part of speech, adjacent dis uency and individual speaker. 11% of the acoustic landmarks de ning /t/ closure, /t/ release and vowel jaw opening maximumwere not detectable in hand labelling. In a separate corpus, 59% of recognition errors involved grammatical or function words like conjunctions, articles, prepositions, pronouns and auxilliary verbs, and for 17 tokens of /tu/, half were misrecognized. Implications of these preliminary results for linguistic theory, cognitive modelling of speech processing and automatic speech recognition are discussed.
在对2个自发语音对话中虚词语音修饰的初步研究中,to、two、too、to和to对应的99个音节/tu/包含了发音变体[t h u、th、D、D、n、s、s、th、th]。影响语音修饰的因素包括语音语境、韵律、词性、相邻差异和说话人个体。11%的声音标志,如/t/关闭,/t/释放和元音颚开口最大值在手标记中未被检测到。在一个单独的语料库中,59%的识别错误涉及语法或功能词,如连词、冠词、介词、代词和助动词,而对于/tu/的17个标记,有一半被错误识别。这些初步结果对语言学理论、语音处理认知建模和自动语音识别的意义进行了讨论。
{"title":"Phonetic modification of the syllable /tu/ in two spontaneous american English dialogues","authors":"N. Veilleux, S. Shattuck-Hufnagel","doi":"10.21437/ICSLP.1998-673","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-673","url":null,"abstract":"In a pilot study of phonetic modi cation of function words in 2 spontaneous speech dialogues, 99 utterances of the syllable /tu/ corresponding to to, two, too, -to and toincluded the pronunciation variants [t h u, t h , D , d , n , s , s, t h , , t h ]. Factors in uencing phonetic modi cation included phonetic context, prosody, part of speech, adjacent dis uency and individual speaker. 11% of the acoustic landmarks de ning /t/ closure, /t/ release and vowel jaw opening maximumwere not detectable in hand labelling. In a separate corpus, 59% of recognition errors involved grammatical or function words like conjunctions, articles, prepositions, pronouns and auxilliary verbs, and for 17 tokens of /tu/, half were misrecognized. Implications of these preliminary results for linguistic theory, cognitive modelling of speech processing and automatic speech recognition are discussed.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116011815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A bootstrap training approach for language model classifiers 语言模型分类器的自举训练方法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-770
V. Warnke, E. Nöth, J. Buckow, S. Harbeck, H. Niemann
In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID.
本文提出了一种语言模型(LM)分类器的自举训练方法。训练依赖于类的LM并并行运行它们,LM可以作为任何类型符号序列的分类器,例如,单词或音素序列,用于主题发现或语言识别(LID)等任务。无论LM分类器使用的特殊符号序列如何,LM的训练都是通过人工标记的训练集来完成的,这些训练集来自不一定是合作的说话者。因此,我们不得不面对一些错误的标签和偏离最初预期的类规范。这两个事实都会使分类恶化。因此,最好不要使用所有的话语进行训练,而是自动选择那些提高识别准确性的话语;这可以通过引导过程来完成。我们展示了我们在verb语料库上的最佳方法在对话行为分类和LID任务上取得的结果。
{"title":"A bootstrap training approach for language model classifiers","authors":"V. Warnke, E. Nöth, J. Buckow, S. Harbeck, H. Niemann","doi":"10.21437/ICSLP.1998-770","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-770","url":null,"abstract":"In this paper, we present a bootstrap training approach for language model (LM) classifiers. Training class dependent LM and running them in parallel, LM can serve as classifiers with any kind of symbol sequence, e.g., word or phoneme sequences for tasks like topic spotting or language identification (LID). Irrespective of the special symbol sequence used for a LM classifier, the training of a LM is done with a manually labeled training set for each class obtained from not necessarily cooperative speakers. Therefore, we have to face some erroneous labels and deviations from the originally intended class specification. Both facts can worsen classification. It might therefore be better not to use all utterances for training but to automatically select those utterances that improve recognition accuracy; this can be done by a bootstrap procedure. We present the results achieved with our best approach on the VERBMOBIL corpus for the tasks dialog act classification and LID.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116389478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust methods for spoken document retrieval 面向口语文档检索的鲁棒方法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-480
Kenney Ng
In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to recognition errors; expanding the document representation to include multiple recognition hypotheses; modifying the original query using automatic relevance feedback to include new terms found in the top ranked documents; and combining information from multiple subword unit representations. We study the different methods individually and then explore the effects of combining them. Experiments on radio broadcast news data show that using a combination of these methods can improve retrieval performance by over 20%.
在本文中,我们研究了一些鲁棒索引和检索方法,以努力提高存在语音识别错误的语音文档检索性能。特别地,我们研究了扩展原始查询表示以包含易混淆的术语;开发一种对识别错误不太敏感的基于近似匹配的文档查询检索方法;扩展文档表示以包含多个识别假设;使用自动相关性反馈修改原始查询,以包含在排名靠前的文档中发现的新术语;并结合来自多个子词单位表示的信息。我们分别研究了不同的方法,然后探讨了将它们结合起来的效果。在广播新闻数据上的实验表明,结合使用这些方法可以使检索性能提高20%以上。
{"title":"Towards robust methods for spoken document retrieval","authors":"Kenney Ng","doi":"10.21437/ICSLP.1998-480","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-480","url":null,"abstract":"In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to recognition errors; expanding the document representation to include multiple recognition hypotheses; modifying the original query using automatic relevance feedback to include new terms found in the top ranked documents; and combining information from multiple subword unit representations. We study the different methods individually and then explore the effects of combining them. Experiments on radio broadcast news data show that using a combination of these methods can improve retrieval performance by over 20%.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116445824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
The microprosodics of tone sandhi in shanghai disyllabic compounds 上海双音节复合语变调的微韵律
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-143
X. Zhu
This paper examines the F0 variations during tone sandhi due to various prosodic factors such as phonation type, length, stress and pitch height. It will be shown that the F0 height and shape of the second syllable (S2) in disyllabic words are determined by the interaction of four conditions: the intervocalic consonant (C2) voicing, the S2 Truncation, the F0 height of S1, and stress assignment. group. Using disyllabic words without C2, 1998 demonstrates smoothly flowing F0 for Shanghai tonal rightward spreading in both and terms. These can be taken to reflect the underlying tension of the vocal cords without influence from supraglottal effects and represent a baseline in terms of which the perturbations observed on items with C2 can be understood.
本文研究了在变调过程中,由于各种韵律因素,如发声类型、长度、重音和音高,F0的变化。结果表明,双音节单词中第二音节(S2)的F0高度和形状是由四个条件的相互作用决定的:间音辅音(C2)发声、S2截断、S1的F0高度和重音分配。组。使用不带C2的双音节词,1998表现了上海声调在两个和两个词上向右传播的流畅的F0。这些可以用来反映声带的潜在张力,而不受声门上效应的影响,并代表一个基线,根据该基线,在C2上观察到的扰动可以被理解。
{"title":"The microprosodics of tone sandhi in shanghai disyllabic compounds","authors":"X. Zhu","doi":"10.21437/ICSLP.1998-143","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-143","url":null,"abstract":"This paper examines the F0 variations during tone sandhi due to various prosodic factors such as phonation type, length, stress and pitch height. It will be shown that the F0 height and shape of the second syllable (S2) in disyllabic words are determined by the interaction of four conditions: the intervocalic consonant (C2) voicing, the S2 Truncation, the F0 height of S1, and stress assignment. group. Using disyllabic words without C2, 1998 demonstrates smoothly flowing F0 for Shanghai tonal rightward spreading in both and terms. These can be taken to reflect the underlying tension of the vocal cords without influence from supraglottal effects and represent a baseline in terms of which the perturbations observed on items with C2 can be understood.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116461279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian constrained frequency warping HMMS for speaker normalisation 针对说话人归一化的贝叶斯约束频率翘曲hmm
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-426
Ching-Hsiang Ho, S. Vaseghi, Aimin Chen
This paper presents a Bayesian constrained frequency warping technique. The Bayesian approach provides for inclusion of the prior information of the frequency warping parameter and for adjusting the search range in order to obtain the best warping factor dependent on HMMs. We introduce novel frequency warping (FWP) HMMs which are different warped versions of HMMs. Instead of frequency warping of the input speech we warp the spectrum of the HMMs. This is equivalent to HMMs which have both time and frequency warping capabilities. Experimentally FWP HMMs outperform the conventional constrained frequency warping approach. Furthermore, the best warping factor is estimated in two stages, a coarse stage followed by a fine stage. This method efficiently gauges the optimal warping factor and normalises the FWP HMMs.
提出了一种贝叶斯约束频率扭曲技术。贝叶斯方法提供了包含频率扭曲参数的先验信息和调整搜索范围的方法,以获得依赖于hmm的最佳扭曲因子。本文介绍了一种新型频率翘曲hmm (FWP),它是hmm的不同翘曲版本。我们不是对输入语音进行频率扭曲,而是对hmm的频谱进行扭曲。这相当于具有时间和频率翘曲能力的hmm。实验表明,FWP hmm优于传统的约束频率翘曲方法。此外,最佳的翘曲因子估计在两个阶段,粗阶段和细阶段。该方法有效地测量了最优翘曲因子,并对FWP hmm进行了归一化。
{"title":"Bayesian constrained frequency warping HMMS for speaker normalisation","authors":"Ching-Hsiang Ho, S. Vaseghi, Aimin Chen","doi":"10.21437/ICSLP.1998-426","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-426","url":null,"abstract":"This paper presents a Bayesian constrained frequency warping technique. The Bayesian approach provides for inclusion of the prior information of the frequency warping parameter and for adjusting the search range in order to obtain the best warping factor dependent on HMMs. We introduce novel frequency warping (FWP) HMMs which are different warped versions of HMMs. Instead of frequency warping of the input speech we warp the spectrum of the HMMs. This is equivalent to HMMs which have both time and frequency warping capabilities. Experimentally FWP HMMs outperform the conventional constrained frequency warping approach. Furthermore, the best warping factor is estimated in two stages, a coarse stage followed by a fine stage. This method efficiently gauges the optimal warping factor and normalises the FWP HMMs.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A discourse coding scheme for conversational Spanish 会话西班牙语的语篇编码方案
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-492
Lori S. Levin, Ann E. Thymé-Gobbel, A. Lavie, K. Ries, K. Zechner
This paper describes a 3-level manual discourse coding scheme that we have devised for manual tagging of the CallHome Spanish (CHS) and CallFriend Spanish (CFS) databases used in the CLARITY project. The goal of CLARITY is to explore the use of discourse structure in understanding conversational sp eech. The project combines empirical methods for dialogue processing with state-of-the art LVCSR (using the JANUS recognizer). The three levels of the coding scheme are (1) a speech act level consisting of a tag set extended from DAMSL and Switchboard; (2) dialogue game level defined by initiative and intention; and (3) an act ivity level defined within topic units. The manually tagged dialog ues are used to train automatic classifiers. We present preliminary results for statement categorization, and give an in-progress repo rt of automatic speech act classification and topic boundary identific ation.
本文描述了我们为CLARITY项目中使用的CallHome西班牙语(CHS)和CallFriend西班牙语(CFS)数据库的手动标注设计的一个3级手动话语编码方案。CLARITY的目标是探索语篇结构在理解会话语言中的应用。该项目将对话处理的经验方法与最先进的LVCSR(使用JANUS识别器)相结合。编码方案的三个层次是:(1)由DAMSL和交换机扩展的标签集组成的语音行为层;(2)由主动性和意图定义的对话游戏关卡;(3)在主题单元内定义的行为活动级别。手动标记的对话框用于训练自动分类器。我们提出了语句分类的初步结果,并给出了自动语音行为分类和主题边界识别的进展报告。
{"title":"A discourse coding scheme for conversational Spanish","authors":"Lori S. Levin, Ann E. Thymé-Gobbel, A. Lavie, K. Ries, K. Zechner","doi":"10.21437/ICSLP.1998-492","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-492","url":null,"abstract":"This paper describes a 3-level manual discourse coding scheme that we have devised for manual tagging of the CallHome Spanish (CHS) and CallFriend Spanish (CFS) databases used in the CLARITY project. The goal of CLARITY is to explore the use of discourse structure in understanding conversational sp eech. The project combines empirical methods for dialogue processing with state-of-the art LVCSR (using the JANUS recognizer). The three levels of the coding scheme are (1) a speech act level consisting of a tag set extended from DAMSL and Switchboard; (2) dialogue game level defined by initiative and intention; and (3) an act ivity level defined within topic units. The manually tagged dialog ues are used to train automatic classifiers. We present preliminary results for statement categorization, and give an in-progress repo rt of automatic speech act classification and topic boundary identific ation.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Boundaries of perception of long tones in taiwanese speech 台湾语长声调的感知边界
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-454
Fran H. L. Jian
In this work we set out to investigate the fundamental frequency boundaries of perception of the Taiwanese long tones. We are interested in how the variations in fundamental frequency affect the perception of linguistic tones in Taiwanese speech. Our investigation is adopted from similar studies of tones in Mandarin speech. As opposed to Mandarin tones that can be perceived with little difficulty the seven Taiwanese tones have a more subtle structure and are consequently harder to perceive successfully. The experimental results in this paper allow us to quantify these perceptual boundaries. The experiments consisted of a perception test involving over 150 Taiwanese subjects where the task involved identifying the tone of the words played back in a random sequence. The stimuli consisted of a set of tone pairs and a selection of intermediate tone words obtained by linearly interpolating between the words of the tone pairs.
在这项工作中,我们着手研究台湾长音感知的基本频率边界。我们感兴趣的是基本频率的变化如何影响台湾语中语言音调的感知。我们的研究采用了对普通话语音声调的类似研究。与普通话的声调相比,台湾的七个声调有一个更微妙的结构,因此更难被成功地感知。本文的实验结果使我们能够量化这些感知边界。实验包括一项感知测试,涉及150多名台湾受试者,任务包括识别随机播放的单词的音调。刺激包括一组音调对和通过在音调对的单词之间线性插值得到的中间音调词的选择。
{"title":"Boundaries of perception of long tones in taiwanese speech","authors":"Fran H. L. Jian","doi":"10.21437/ICSLP.1998-454","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-454","url":null,"abstract":"In this work we set out to investigate the fundamental frequency boundaries of perception of the Taiwanese long tones. We are interested in how the variations in fundamental frequency affect the perception of linguistic tones in Taiwanese speech. Our investigation is adopted from similar studies of tones in Mandarin speech. As opposed to Mandarin tones that can be perceived with little difficulty the seven Taiwanese tones have a more subtle structure and are consequently harder to perceive successfully. The experimental results in this paper allow us to quantify these perceptual boundaries. The experiments consisted of a perception test involving over 150 Taiwanese subjects where the task involved identifying the tone of the words played back in a random sequence. The stimuli consisted of a set of tone pairs and a selection of intermediate tone words obtained by linearly interpolating between the words of the tone pairs.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122798343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model for speech reverberation and intelligibility restoring filters 语音混响和可听性恢复滤波器模型
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-310
O. Kenny, D. Nelson
The problem of removing channel effects from speech has generally been attacked by attempting to recover a time-varying filter which inverts the entire channel impulse response. We show that human listeners are insensitive to many channel conditions and that the human ear seems to respond primarily to discontinuities of the channel. As a result of these observations, a partial equalization is proposed in which the channel effects to which the ear is sensitive may be removed, without full inversion of the channel. In addition, it is shown that it is possible to build filters of arbitrary length which do not reduce speech intelligibility and do not produce annoying artifacts.
从语音中去除信道效应的问题通常是通过试图恢复一个反转整个信道脉冲响应的时变滤波器来解决的。我们表明,人类听众对许多频道条件不敏感,人耳似乎主要对频道的不连续性作出反应。根据这些观察结果,提出了一种局部均衡,其中耳朵敏感的通道效应可以被去除,而不会完全反转通道。此外,研究表明,可以构建任意长度的滤波器,这些滤波器不会降低语音的可理解性,也不会产生令人讨厌的伪影。
{"title":"A model for speech reverberation and intelligibility restoring filters","authors":"O. Kenny, D. Nelson","doi":"10.21437/ICSLP.1998-310","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-310","url":null,"abstract":"The problem of removing channel effects from speech has generally been attacked by attempting to recover a time-varying filter which inverts the entire channel impulse response. We show that human listeners are insensitive to many channel conditions and that the human ear seems to respond primarily to discontinuities of the channel. As a result of these observations, a partial equalization is proposed in which the channel effects to which the ear is sensitive may be removed, without full inversion of the channel. In addition, it is shown that it is possible to build filters of arbitrary length which do not reduce speech intelligibility and do not produce annoying artifacts.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114366566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Partitioning and transcription of broadcast news data 广播新闻数据的分割和转录
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-618
J. Gauvain, L. Lamel, G. Adda
Radio and television broadcasts consist of a continuous stream of data comprised of segments of different linguistic and acoustic natures, which poses challenges for transcription. In this paper we report on our recent work in transcribing broadcast news data[2, 4], including the problem of partitioning the data into homogeneous segments prior to word recognition. Gaussian mixture models are used to identify speech and non-speech segments. A maximum-likelihood segmentation/clustering process is then applied to the speech segments using GMMs and an agglomerative clustering algorithm. The clustered segments are then labeled according to bandwidth and gender. The recog-nizer is a continuous mixture density, tied-state cross-word context-dependent HMM system with a 65k trigram language model. Decoding is carried out inthree passes, witha final pass incorporating cluster-based test-set MLLR adaptation. The overall word transcription error on the Nov’97 unpartitioned evaluation test data was 18.5%.
广播和电视广播由由不同语言和声学性质的片段组成的连续数据流组成,这对转录提出了挑战。在本文中,我们报告了我们最近在广播新闻数据转录方面的工作[2,4],包括在单词识别之前将数据划分为同质片段的问题。高斯混合模型用于识别语音和非语音片段。然后使用gmm和聚类算法对语音片段进行最大似然分割/聚类处理。然后根据带宽和性别对聚类段进行标记。该识别器是一个连续混合密度,绑定状态的交叉词上下文依赖HMM系统,具有65k的三组语言模型。解码分三次进行,最后一次采用基于聚类的测试集MLLR适应。在97年11月的未分割评价测试数据上,总的单词转录误差为18.5%。
{"title":"Partitioning and transcription of broadcast news data","authors":"J. Gauvain, L. Lamel, G. Adda","doi":"10.21437/ICSLP.1998-618","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-618","url":null,"abstract":"Radio and television broadcasts consist of a continuous stream of data comprised of segments of different linguistic and acoustic natures, which poses challenges for transcription. In this paper we report on our recent work in transcribing broadcast news data[2, 4], including the problem of partitioning the data into homogeneous segments prior to word recognition. Gaussian mixture models are used to identify speech and non-speech segments. A maximum-likelihood segmentation/clustering process is then applied to the speech segments using GMMs and an agglomerative clustering algorithm. The clustered segments are then labeled according to bandwidth and gender. The recog-nizer is a continuous mixture density, tied-state cross-word context-dependent HMM system with a 65k trigram language model. Decoding is carried out inthree passes, witha final pass incorporating cluster-based test-set MLLR adaptation. The overall word transcription error on the Nov’97 unpartitioned evaluation test data was 18.5%.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122102940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 182
Nonlinear interpolation of topic models for language model adaptation 面向语言模型自适应的主题模型非线性插值
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-667
K. Seymore, Stanley F. Chen, R. Rosenfeld
Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a language model to the topic of a document, using a nonlinear interpolation of -gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve a slight decrease in perplexity and speech recognition word error rate on a Broadcast News test set using these techniques. Our results are compared to results obtained through linear interpolation of topic models.
语言建模的主题适应涉及调整语言模型中的概率,以更好地反映新文档主题词的预期频率。要适应的语言模型通常是从大量的训练文本中构建的,并且被认为是当前领域的代表。为了使该模型适应新文档,需要确定新文档的主题(或多个主题)。然后,在已识别的主题中比一般情况下更可能出现的单词的概率被提升,而不太可能出现在已识别主题中的单词的概率被抑制。我们提出了一种新的技术,使语言模型适应文档的主题,使用-gram语言模型的非线性插值。将词汇表分为一般类、主题类和非主题类的三种相互排斥的方法用于组合来自特定主题和一般语言模型的单词预测。我们使用这些技术在广播新闻测试集上实现了困惑和语音识别错误率的轻微降低。我们的结果与通过主题模型线性插值得到的结果进行了比较。
{"title":"Nonlinear interpolation of topic models for language model adaptation","authors":"K. Seymore, Stanley F. Chen, R. Rosenfeld","doi":"10.21437/ICSLP.1998-667","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-667","url":null,"abstract":"Topic adaptation for language modeling is concerned with adjusting the probabilities in a language model to better reflect the expected frequencies of topical words for a new document. The language model to be adapted is usually built from large amounts of training text and is considered representative of the current domain. In order to adapt this model for a new document, the topic (or topics) of the new document are identified. Then, the probabilities of words that are more likely to occur in the identified topic(s) than in general are boosted, and the probabilities of words that are unlikely for the identified topic(s) are suppressed. We present a novel technique for adapting a language model to the topic of a document, using a nonlinear interpolation of -gram language models. A three-way, mutually exclusive division of the vocabulary into general, on-topic and off-topic word classes is used to combine word predictions from a topic-specific and a general language model. We achieve a slight decrease in perplexity and speech recognition word error rate on a Broadcast News test set using these techniques. Our results are compared to results obtained through linear interpolation of topic models.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117310002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
5th International Conference on Spoken Language Processing (ICSLP 1998)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1