首页 > 最新文献

SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology最新文献

英文 中文
STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS. stylettes - vc:通过基于风格的TTS模型的知识转移进行一次语音转换。
Yinghao Aaron Li, Cong Han, Nima Mesgarani

One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.

单次语音转换(One-shot voice conversion, VC)旨在将任意源说话者的语音转换为任意目标说话者,而目标说话者只需要几秒钟的参考语音。这在很大程度上依赖于理清说话者的身份和演讲内容,这一任务仍然具有挑战性。在这里,我们提出了一种新的方法,通过基于风格的文本到语音(TTS)模型的迁移学习来学习解纠缠语音表示。通过周期一致性和对抗性训练,基于风格的TTS模型可以以高保真度和相似性执行转录引导的一次性VC。通过师生知识转移和新颖的数据增强方案学习一个额外的梅尔谱图编码器,我们的方法在不需要输入文本的情况下实现了语音表示的解纠缠。主观评价表明,我们的方法在自然度和相似度方面都明显优于以前最先进的单次语音转换模型。
{"title":"STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.","authors":"Yinghao Aaron Li,&nbsp;Cong Han,&nbsp;Nima Mesgarani","doi":"10.1109/slt54892.2023.10022498","DOIUrl":"https://doi.org/10.1109/slt54892.2023.10022498","url":null,"abstract":"<p><p>One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2022 ","pages":"920-927"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10417535/pdf/nihms-1919646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9990482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM. 自闭症儿童语言发展轨迹的计算分析。
Emily Prud'hommeaux, Eric Morley, Masoud Rouhizadeh, Laura Silverman, Jan van Santen, Brian Roark, Richard Sproat, Sarah Kauper, Rachel DeLaHunta

Deficits in semantic and pragmatic expression are among the hallmark linguistic features of autism. Recent work in deriving computational correlates of clinical spoken language measures has demonstrated the utility of automated linguistic analysis for characterizing the language of children with autism. Most of this research, however, has focused either on young children still acquiring language or on small populations covering a wide age range. In this paper, we extract numerous linguistic features from narratives produced by two groups of children with and without autism from two narrow age ranges. We find that although many differences between diagnostic groups remain constant with age, certain pragmatic measures, particularly the ability to remain on topic and avoid digressions, seem to improve. These results confirm findings reported in the psychology literature while underscoring the need for careful consideration of the age range of the population under investigation when performing clinically oriented computational analysis of spoken language.

语义和语用表达的缺陷是自闭症的标志性语言特征之一。最近在临床口语测量的计算相关性方面的工作已经证明了自动语言分析在描述自闭症儿童语言特征方面的实用性。然而,大多数研究都集中在仍在学习语言的幼儿身上,或者集中在覆盖广泛年龄范围的小群体身上。在本文中,我们从两组自闭症儿童和非自闭症儿童在两个狭窄的年龄范围内所产生的叙述中提取了许多语言特征。我们发现,尽管诊断组之间的许多差异随着年龄的增长而保持不变,但某些实用措施,特别是保持主题和避免离题的能力,似乎有所改善。这些结果证实了心理学文献中报道的发现,同时强调在进行临床导向的口语计算分析时,需要仔细考虑被调查人群的年龄范围。
{"title":"COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM.","authors":"Emily Prud'hommeaux,&nbsp;Eric Morley,&nbsp;Masoud Rouhizadeh,&nbsp;Laura Silverman,&nbsp;Jan van Santen,&nbsp;Brian Roark,&nbsp;Richard Sproat,&nbsp;Sarah Kauper,&nbsp;Rachel DeLaHunta","doi":"10.1109/SLT.2014.7078585","DOIUrl":"https://doi.org/10.1109/SLT.2014.7078585","url":null,"abstract":"<p><p>Deficits in semantic and pragmatic expression are among the hallmark linguistic features of autism. Recent work in deriving computational correlates of clinical spoken language measures has demonstrated the utility of automated linguistic analysis for characterizing the language of children with autism. Most of this research, however, has focused either on young children still acquiring language or on small populations covering a wide age range. In this paper, we extract numerous linguistic features from narratives produced by two groups of children with and without autism from two narrow age ranges. We find that although many differences between diagnostic groups remain constant with age, certain pragmatic measures, particularly the ability to remain on topic and avoid digressions, seem to improve. These results confirm findings reported in the psychology literature while underscoring the need for careful consideration of the age range of the population under investigation when performing clinically oriented computational analysis of spoken language.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2014 ","pages":"266-271"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/SLT.2014.7078585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35532885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS. 使用无监督 HMMs 对日常对话样本中的语音片段进行稳健检测。
Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk

We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility get-f0 and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum a posteriori criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.

我们研究了从环境录音中检测日常对话中语音片段的方法。这类录音包含多种背景噪声,因此很难或不可能收集到有代表性的标记样本来估计特定噪声的 HMM 模型。流行的实用程序 get-f0 及其衍生程序会计算归一化交叉相关来检测发声片段,但不幸的是,这种方法对不同类型的噪声都很敏感。由于有声语音不仅具有周期性,而且还富含谐波,因此我们采用谐波模型对有声片段进行建模。在以前的工作中,我们使用最大似然准则对每个帧的模型参数进行独立估计。然而,由于谐波系数的分布取决于说话者的发音器官,我们采用最大后验标准来估算模型参数会更加稳健。我们使用谐波模型计算出的发声可能性作为 HMM 的观测概率,并使用这种无监督 HMM 检测语音。谐波模型的一个缺点是无法将语音与其他静态谐波噪音区分开来。我们利用语音的非稳态特性纠正了这一缺陷。我们在一个大型日常语音库的语音检测任务中对我们的模型进行了实证评估,结果表明这些模型的性能明显优于流行工具中采用的标准语音检测算法。
{"title":"ROBUST DETECTION OF VOICED SEGMENTS IN SAMPLES OF EVERYDAY CONVERSATIONS USING UNSUPERVISED HMMS.","authors":"Meysam Asgari, Izhak Shafran, Alireza Bayestehtashk","doi":"10.1109/slt.2012.6424264","DOIUrl":"10.1109/slt.2012.6424264","url":null,"abstract":"<p><p>We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility <i>get-f0</i> and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum <i>a posteriori</i> criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.</p>","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"2012 ","pages":"438-442"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7909075/pdf/nihms-1670854.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25414977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation 基于分数范围估计的有效的先验和增量波束宽度控制来抑制过多的语音识别时间
Satoshi Kobashikawa, Takaaki Hori, Y. Yamaguchi, Taichi Asami, H. Masataki, Satoshi Takahashi
{"title":"Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation","authors":"Satoshi Kobashikawa, Takaaki Hori, Y. Yamaguchi, Taichi Asami, H. Masataki, Satoshi Takahashi","doi":"10.1109/SLT.2012.6424209","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424209","url":null,"abstract":"","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"214 1","pages":"125-130"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72783333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Technology Opportunities and Challenges 语音技术的机遇与挑战
D. Nahamoo
Summary form only given. Two forces are in pursuit of discovering the possibilities of speech technology automation. First is the global research and development community which has been hard at work for improving the performance and usability of the technology. Second is the business community which constantly evaluates the performance of the technology against the expectation of the user community for delivering solutions such as a spoken car navigation system. While the performance improvement has been on a constant positive progress curve, the market opportunity has been on a much more uncertain curve. For example, the early vision of delivering a dictation solution has been on hold in recent years while it enjoyed enormous interest in the 90 s. At the same time, some industry experts predict that this vision will be fulfilled soon because of the usability needs of billions of mobile devices in use today. Analogies can be drawn for the use of speech technologies for call centers self service interaction. While we have seen a much bigger market success, some industry experts predict that web self services will slow down the use of speech self service. So, where does the truth lie? What are those market opportunities that are clear winners? What opportunities will open up in future and what are their technical challenges? In this talk, we will address some of these questions.
只提供摘要形式。两股力量正在探索语音技术自动化的可能性。首先是全球研究和开发社区,他们一直在努力提高技术的性能和可用性。第二种是商业社区,他们不断地根据用户社区的期望来评估技术的性能,以提供诸如语音汽车导航系统之类的解决方案。虽然性能改善一直处于一个持续的积极进展曲线上,但市场机会却处于一个更加不确定的曲线上。例如,提供口授解决方案的早期愿景近年来一直被搁置,而它在90年代受到了极大的关注。与此同时,一些行业专家预测,由于目前使用的数十亿移动设备的可用性需求,这一愿景将很快实现。在呼叫中心自助服务交互中使用语音技术可以进行类比。虽然我们已经看到了更大的市场成功,但一些行业专家预测,网络自助服务将减缓语音自助服务的使用。那么,真相在哪里呢?哪些市场机会是明显的赢家?未来会有哪些机会?他们面临的技术挑战是什么?在这次演讲中,我们将讨论其中的一些问题。
{"title":"Speech Technology Opportunities and Challenges","authors":"D. Nahamoo","doi":"10.1109/SLT.2006.326778","DOIUrl":"https://doi.org/10.1109/SLT.2006.326778","url":null,"abstract":"Summary form only given. Two forces are in pursuit of discovering the possibilities of speech technology automation. First is the global research and development community which has been hard at work for improving the performance and usability of the technology. Second is the business community which constantly evaluates the performance of the technology against the expectation of the user community for delivering solutions such as a spoken car navigation system. While the performance improvement has been on a constant positive progress curve, the market opportunity has been on a much more uncertain curve. For example, the early vision of delivering a dictation solution has been on hold in recent years while it enjoyed enormous interest in the 90 s. At the same time, some industry experts predict that this vision will be fulfilled soon because of the usability needs of billions of mobile devices in use today. Analogies can be drawn for the use of speech technologies for call centers self service interaction. While we have seen a much bigger market success, some industry experts predict that web self services will slow down the use of speech self service. So, where does the truth lie? What are those market opportunities that are clear winners? What opportunities will open up in future and what are their technical challenges? In this talk, we will address some of these questions.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"34 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82726673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
No More Strings, please 请不要再用绳子了
Kevin Knight
Summary form only given. In natural language research, many (grammar) trees were felled in 1992, to make room for the highly successful string-based HMM industry. A small literature survived on parsing (putting a tree on a string) and syntactic language modeling (putting a weight on a string). However, trees are making a comeback. Tree transformations are turning out to be very useful in large-scale machine translation (MT), and we will cover recent developments in this area. Most of the tree techniques used in MT turn out to be generic, leading to tools and software for manipulating tree automata in general. Tree acceptors and transducers generalize HMM techniques to the world of trees, raising many interesting theoretical and practical problems.
只提供摘要形式。在自然语言研究方面,1992年许多(语法)树木被砍伐,为非常成功的基于字符串的HMM产业腾出空间。一小部分文献通过解析(在字符串上添加树)和语法语言建模(在字符串上添加权重)幸存下来。然而,树木正在卷土重来。树变换在大规模机器翻译(MT)中非常有用,我们将介绍这一领域的最新发展。机器翻译中使用的大多数树形技术都是通用的,这导致了用于操作树形自动机的工具和软件。树形受体和换能器将HMM技术推广到树形世界,提出了许多有趣的理论和实践问题。
{"title":"No More Strings, please","authors":"Kevin Knight","doi":"10.1109/SLT.2006.326779","DOIUrl":"https://doi.org/10.1109/SLT.2006.326779","url":null,"abstract":"Summary form only given. In natural language research, many (grammar) trees were felled in 1992, to make room for the highly successful string-based HMM industry. A small literature survived on parsing (putting a tree on a string) and syntactic language modeling (putting a weight on a string). However, trees are making a comeback. Tree transformations are turning out to be very useful in large-scale machine translation (MT), and we will cover recent developments in this area. Most of the tree techniques used in MT turn out to be generic, leading to tools and software for manipulating tree automata in general. Tree acceptors and transducers generalize HMM techniques to the world of trees, raising many interesting theoretical and practical problems.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"60 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82633077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information Extraction from speech 语音信息提取
J. Makhoul
Summary form only given. The state of the art in automatic speech recognition has reached the point that searching for and extracting information from large speech repositories or streaming audio has become a growing reality. This paper summarizes the technologies that have been instrumental in making audio as searchable as text, including speech recognition, speaker clustering, segmentation, and identification; topic classification; and story segmentation. Once speech is turned into text, information extraction methods can then be applied, such as named entity extraction, finding relationships between named entities, and resolution of anaphoric references. Examples of deployed systems for information extraction from speech, which incorporate some of the aforementioned technologies, will be given.
只提供摘要形式。自动语音识别的技术水平已经达到了从大型语音存储库或流音频中搜索和提取信息的程度,这已经成为一种日益增长的现实。本文总结了使音频像文本一样可搜索的技术,包括语音识别、说话人聚类、分割和识别;主题分类;故事分割。一旦语音转化为文本,就可以应用信息提取方法,如命名实体提取,查找命名实体之间的关系,以及解析回指引用。本文将给出用于从语音中提取信息的已部署系统的例子,这些系统采用了上述的一些技术。
{"title":"Information Extraction from speech","authors":"J. Makhoul","doi":"10.1109/SLT.2006.326780","DOIUrl":"https://doi.org/10.1109/SLT.2006.326780","url":null,"abstract":"Summary form only given. The state of the art in automatic speech recognition has reached the point that searching for and extracting information from large speech repositories or streaming audio has become a growing reality. This paper summarizes the technologies that have been instrumental in making audio as searchable as text, including speech recognition, speaker clustering, segmentation, and identification; topic classification; and story segmentation. Once speech is turned into text, information extraction methods can then be applied, such as named entity extraction, finding relationships between named entities, and resolution of anaphoric references. Examples of deployed systems for information extraction from speech, which incorporate some of the aforementioned technologies, will be given.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"38 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80964566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Widening the NLP Pipeline for spoken Language Processing 扩大口语语言处理的NLP管道
S. Bangalore
Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps along a narrow pipe: each step transforms a single input string into a single best solution string. However, this narrow pipe is limiting for two reasons: First, since each of the preprocessing steps are erroneous, producing a single best solution could magnify the error propogation down the pipeline. Second, the preprocessing steps are forced to resolve genuine ambiguity prematurely. While the widening of the pipeline can potentially benefit text-based language applications, it is imperative for spoken language processing where the output from the speech recognizer is typically a word lattice/graph. In this talk, we review how such a goal has been accomplished in tasks such as spoken language understanding, speech translation and multimodal language processing. We will also sketch methods that encode the preprocessing steps as finite-state transductions in order to exploit composition of finite-state transducers as a general constraint propogation method.
只提供摘要形式。一个典型的基于文本的自然语言应用程序(例如:机器翻译(摘要、信息提取)由一系列预处理步骤组成,如标记化、词干提取、词性标注、命名实体检测、分块、解析。信息沿着一条狭窄的管道通过预处理步骤向下游流动:每一步都将单个输入字符串转换为单个最佳解决方案字符串。然而,这种狭窄的管道有两个限制:首先,由于每个预处理步骤都是错误的,因此产生单个最佳解决方案可能会放大错误在管道中的传播。其次,预处理步骤被迫过早地解决真正的歧义。虽然管道的扩大可能有利于基于文本的语言应用程序,但对于语音处理来说,它是必要的,因为语音识别器的输出通常是一个词格/图。在这次演讲中,我们回顾了如何在口语理解、语音翻译和多模态语言处理等任务中实现这一目标。我们还将概述将预处理步骤编码为有限状态换能器的方法,以便利用有限状态换能器的组合作为一般约束传播方法。
{"title":"Widening the NLP Pipeline for spoken Language Processing","authors":"S. Bangalore","doi":"10.1109/SLT.2006.326787","DOIUrl":"https://doi.org/10.1109/SLT.2006.326787","url":null,"abstract":"Summary form only given. A typical text-based natural language application (eg. machine translation, summarization, information extraction) consists of a pipeline of preprocessing steps such as tokenization, stemming, part-of-speech tagging, named entity detection, chunking, parsing. Information flows downstream through the preprocessing steps along a narrow pipe: each step transforms a single input string into a single best solution string. However, this narrow pipe is limiting for two reasons: First, since each of the preprocessing steps are erroneous, producing a single best solution could magnify the error propogation down the pipeline. Second, the preprocessing steps are forced to resolve genuine ambiguity prematurely. While the widening of the pipeline can potentially benefit text-based language applications, it is imperative for spoken language processing where the output from the speech recognizer is typically a word lattice/graph. In this talk, we review how such a goal has been accomplished in tasks such as spoken language understanding, speech translation and multimodal language processing. We will also sketch methods that encode the preprocessing steps as finite-state transductions in order to exploit composition of finite-state transducers as a general constraint propogation method.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"48 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85810428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-Based Methods for Language Processing and Information Retrieval 基于图的语言处理和信息检索方法
Dragomir R. Radev
Summary form only given. A number of problems in information retrieval and natural language processing can be approached using graph theory. Some representative examples in IR include Brin and Page's Pagerank and Kleinberg's HITS for document ranking using graph-based random walk models. In NLP, one could mention Pang and Lee's work on sentiment analysis using graph min- cuts, Mihalcea's work on word sense disambiguation, Zhu et al.'s label propagation algorithms, Toutanova et al.'s prepositional attachment algorithm, and McDonald et al.'s dependency parsing algorithm using minimum spanning trees. In this talk I will quickly summarize three graph-based algorithms developed recently at the University of Michigan: (a) lexrank, a method for multidocument summarization based on random walks on lexical centrality graphs, (b) TUMBL, a generic method using bipartite graphs for semi-supervised learning, and (c) biased lexrank, a semi-supervised technique for passage ranking for information retrieval and discuss the applicability of such techniques to other problems in Natural Language Processing and Information Retrieval.
只提供摘要形式。信息检索和自然语言处理中的许多问题都可以用图论来解决。IR中的一些代表性例子包括Brin和Page的Pagerank和Kleinberg使用基于图的随机漫步模型进行文档排名的HITS。在NLP中,人们可以提到Pang和Lee使用图最小切割进行情感分析的工作,Mihalcea在词义消歧方面的工作,Zhu等人的标签传播算法,Toutanova等人的prepositional attachment算法,以及McDonald等人使用最小生成树的依赖解析算法。在这次演讲中,我将快速总结密歇根大学最近开发的三种基于图的算法:(a) lexrank,一种基于词汇中心图随机游走的多文档摘要方法,(b) TUMBL,一种使用二部图进行半监督学习的通用方法,以及(c) biased lexrank,一种用于信息检索的段落排序的半监督技术,并讨论了这些技术在自然语言处理和信息检索中的其他问题的适用性。
{"title":"Graph-Based Methods for Language Processing and Information Retrieval","authors":"Dragomir R. Radev","doi":"10.1109/SLT.2006.326781","DOIUrl":"https://doi.org/10.1109/SLT.2006.326781","url":null,"abstract":"Summary form only given. A number of problems in information retrieval and natural language processing can be approached using graph theory. Some representative examples in IR include Brin and Page's Pagerank and Kleinberg's HITS for document ranking using graph-based random walk models. In NLP, one could mention Pang and Lee's work on sentiment analysis using graph min- cuts, Mihalcea's work on word sense disambiguation, Zhu et al.'s label propagation algorithms, Toutanova et al.'s prepositional attachment algorithm, and McDonald et al.'s dependency parsing algorithm using minimum spanning trees. In this talk I will quickly summarize three graph-based algorithms developed recently at the University of Michigan: (a) lexrank, a method for multidocument summarization based on random walks on lexical centrality graphs, (b) TUMBL, a generic method using bipartite graphs for semi-supervised learning, and (c) biased lexrank, a semi-supervised technique for passage ranking for information retrieval and discuss the applicability of such techniques to other problems in Natural Language Processing and Information Retrieval.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"6 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89351600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Adaptation for Dialog Act Tagging 对话行为标注的模型自适应
Gökhan Tür, Ümit Güz, Dilek Z. Hakkani-Tür
In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.
本文分析了模型自适应对对话行为标注的影响。自适应的目标是使用域外数据或模型来提高标注器的性能。对话行为标注的目的是为进一步的语篇分析和理解提供基础。在这项研究中,我们使用了ICSI会议语料库和高级会议识别对话行为(MRDA)标签,即问题、陈述、反向通道、中断和地板抓取者/持有者。采用SWBD- damsl标签作为域外语料库,对SWBD语料库进行了控制自适应实验。我们的研究结果表明,我们可以通过自动选择交换机语料库的一个子集,并通过逻辑回归结合域内和域外模型获得的置信度,特别是当域内数据有限时,我们可以实现更好的对话行为标记。
{"title":"Model Adaptation for Dialog Act Tagging","authors":"Gökhan Tür, Ümit Güz, Dilek Z. Hakkani-Tür","doi":"10.1109/SLT.2006.326825","DOIUrl":"https://doi.org/10.1109/SLT.2006.326825","url":null,"abstract":"In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.","PeriodicalId":74811,"journal":{"name":"SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology","volume":"204 1","pages":"94-97"},"PeriodicalIF":0.0,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77023227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
SLT ... : ... IEEE Workshop on Spoken Language Technology : proceedings. IEEE Workshop on Spoken Language Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1