首页 > 最新文献

2008 IEEE Spoken Language Technology Workshop最新文献

英文 中文
Multilingual spoken-password based user authentication in emerging economies using cellular phone networks 新兴经济体中使用移动电话网络的基于多语言语音密码的用户认证
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777826
A. Das, O. K. Manyam, Makarand Tapaswi, Veeresh Taranalli
Mobile phones are playing an important role in changing the socio-economic landscapes of emerging economies like India. A proper voice-based user authentication will help in many new mobile based applications including mobile-commerce and banking. We present our exploration and evaluation of an experimental set-up for user authentication in remote Indian villages using mobile phones and user-selected multilingual spoken passwords. We also present an effective speaker recognition method using a set of novel features called Compressed Feature Dynamics (CFD) which capture the speaker-identity effectively from the speech dynamics contained in the spoken passwords. Early trials demonstrate the effectiveness of the proposed method in handling noisy cell-phone speech. Compared to conventional text-dependent speaker recognition methods, the proposed CFD method delivers competitive performance while significantly reducing storage and computational complexity - an advantage highly beneficial for cell-phone based deployment of such user authentication systems.
手机在改变印度等新兴经济体的社会经济格局方面发挥着重要作用。适当的基于语音的用户认证将有助于许多新的基于移动的应用程序,包括移动电子商务和银行。我们提出了我们的探索和评估的实验设置用户身份验证在偏远的印度村庄使用移动电话和用户选择的多语言语音密码。我们还提出了一种有效的说话人识别方法,该方法使用一组称为压缩特征动力学(CFD)的新特征,从语音密码中包含的语音动态中有效地捕获说话人身份。早期试验证明了该方法在处理手机语音噪声方面的有效性。与传统的依赖文本的说话人识别方法相比,所提出的CFD方法在显著降低存储和计算复杂度的同时,提供了具有竞争力的性能——这一优势对基于手机的用户身份验证系统的部署非常有利。
{"title":"Multilingual spoken-password based user authentication in emerging economies using cellular phone networks","authors":"A. Das, O. K. Manyam, Makarand Tapaswi, Veeresh Taranalli","doi":"10.1109/SLT.2008.4777826","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777826","url":null,"abstract":"Mobile phones are playing an important role in changing the socio-economic landscapes of emerging economies like India. A proper voice-based user authentication will help in many new mobile based applications including mobile-commerce and banking. We present our exploration and evaluation of an experimental set-up for user authentication in remote Indian villages using mobile phones and user-selected multilingual spoken passwords. We also present an effective speaker recognition method using a set of novel features called Compressed Feature Dynamics (CFD) which capture the speaker-identity effectively from the speech dynamics contained in the spoken passwords. Early trials demonstrate the effectiveness of the proposed method in handling noisy cell-phone speech. Compared to conventional text-dependent speaker recognition methods, the proposed CFD method delivers competitive performance while significantly reducing storage and computational complexity - an advantage highly beneficial for cell-phone based deployment of such user authentication systems.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Modeling vocal interaction for text-independent detection of involvement hotspots in multi-party meetings 基于文本独立的多方会议参与热点检测的语音交互建模
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777845
K. Laskowski
Indexing, retrieval, and summarization in recordings of meetings have, to date, focused largely on the propositional content of what participants say. Although objectively relevant, such content may not be the sole or even the main aim of potential system users. Instead, users may be interested in information bearing on conversation flow. We explore the automatic detection of one example of such information, namely that of hotspots defined in terms of participant involvement. Our proposed system relies exclusively on low-level vocal activity features, and yields a classification accuracy of 84%, representing a 39% reduction of error relative to a baseline which selects the majority class.
迄今为止,会议记录的索引、检索和摘要主要集中在与会者发言的命题内容上。虽然客观上是相关的,但这些内容可能不是潜在系统用户的唯一目标,甚至不是主要目标。相反,用户可能对会话流中的信息感兴趣。我们探索自动检测此类信息的一个例子,即根据参与者参与定义的热点。我们提出的系统完全依赖于低级的声音活动特征,并产生了84%的分类准确率,相对于选择大多数类别的基线减少了39%的误差。
{"title":"Modeling vocal interaction for text-independent detection of involvement hotspots in multi-party meetings","authors":"K. Laskowski","doi":"10.1109/SLT.2008.4777845","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777845","url":null,"abstract":"Indexing, retrieval, and summarization in recordings of meetings have, to date, focused largely on the propositional content of what participants say. Although objectively relevant, such content may not be the sole or even the main aim of potential system users. Instead, users may be interested in information bearing on conversation flow. We explore the automatic detection of one example of such information, namely that of hotspots defined in terms of participant involvement. Our proposed system relies exclusively on low-level vocal activity features, and yields a classification accuracy of 84%, representing a 39% reduction of error relative to a baseline which selects the majority class.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Incorporating discourse context in spoken language translation through dialog acts 通过对话行为将话语语境融入口语翻译中
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777892
V. Sridhar, Shrikanth S. Narayanan, S. Bangalore
Current statistical speech translation approaches predominantly rely on just text transcripts and are limited in their use of rich contextual information such as prosody and discourse function. In this paper, we explore the role of discourse context characterized through dialog acts (DAs) in statistical translation. We present a bag-of-words (BOW) model that exploits DA tags in translation and contrast it with a phrase table interpolation approach presented in previous work. In addition to producing interpretable DA-annotated target language translations through our framework, we also obtain consistent improvements in terms of automatic evaluation metrics such as lexical selection accuracy and BLEU score using both the models. We also analyze the performance improvements per DA tag. Our experiments indicate that questions, acknowledgments, agreements and appreciations contribute to more improvement in comparison to statements.
目前的统计语音翻译方法主要依赖于文本文本,在使用韵律和话语功能等丰富的上下文信息方面受到限制。本文探讨了以对话行为为特征的语篇语境在统计翻译中的作用。我们提出了一个词袋(BOW)模型,该模型利用翻译中的DA标签,并将其与先前工作中提出的短语表插值方法进行对比。除了通过我们的框架生成可解释的da注释的目标语言翻译外,我们还在使用这两个模型的自动评估指标(如词汇选择准确性和BLEU分数)方面获得了一致的改进。我们还分析了每个DA标签的性能改进。我们的实验表明,与陈述相比,问题、致谢、同意和赞赏有助于更大的改进。
{"title":"Incorporating discourse context in spoken language translation through dialog acts","authors":"V. Sridhar, Shrikanth S. Narayanan, S. Bangalore","doi":"10.1109/SLT.2008.4777892","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777892","url":null,"abstract":"Current statistical speech translation approaches predominantly rely on just text transcripts and are limited in their use of rich contextual information such as prosody and discourse function. In this paper, we explore the role of discourse context characterized through dialog acts (DAs) in statistical translation. We present a bag-of-words (BOW) model that exploits DA tags in translation and contrast it with a phrase table interpolation approach presented in previous work. In addition to producing interpretable DA-annotated target language translations through our framework, we also obtain consistent improvements in terms of automatic evaluation metrics such as lexical selection accuracy and BLEU score using both the models. We also analyze the performance improvements per DA tag. Our experiments indicate that questions, acknowledgments, agreements and appreciations contribute to more improvement in comparison to statements.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133778745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing 语音自动变形中基于元音的频率对齐函数设计和基于识别的时间对齐
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777831
Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara
New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.
提出了语音自动变形时频对准的新设计方法。特定帧的频率对齐函数表示为基于每个元音相似度的元音对齐函数的加权平均值。Julian,一个开源的语音识别系统,被用来设计一个时间对齐功能。进行了客观和主观测试来评估所提出的方法,测试结果表明,所提出的方法在时间对齐方面与人工变形样品产生相当的自然度。结果还表明,与没有频率对准的变形样本相比,所提出的频率对准提供了明显更好的自然度。
{"title":"Vowel-based frequency alignment function design and recognition-based time alignment for automatic speech morphing","authors":"Masato Onishi, Toru Takahashi, T. Irino, Hideki Kawahara","doi":"10.1109/SLT.2008.4777831","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777831","url":null,"abstract":"New design procedures of time-frequency alignment for automatic speech morphing are proposed. The frequency alignment function at a specific frame is represented as a weighted average of vowel alignment functions based on similarity to each vowel. Julian, an open source speech recognition system, was used to design a time alignment function. Objective and subjective tests were conducted to evaluate the proposed method, and test results indicated that the proposed method yields comparable naturalness to the manually morphed samples in terms of time alignment. The results also illustrated that the proposed frequency alignment provides significantly better naturalness than morphed samples without frequency alignment.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"402 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133557660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using hidden Markov models for topic segmentation of meeting transcripts 基于隐马尔可夫模型的会议记录主题分割
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777871
Melissa Sherman, Yang Liu
In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.
在本文中,我们提出了一种隐马尔可夫模型(HMM)方法来将会议记录分割成主题。为了学习该模型,我们使用无监督学习对从主题边界信息中获得的文本片段进行聚类。使用改进的WinDiff和Pk指标,我们证明HMM优于LCSeg, LCSeg是一种使用ICSI会议语料库进行主题分割的最先进的基于词汇链的方法。我们评估了语言模型顺序、隐藏状态的数量和停止词的使用的影响。我们的实验结果表明,单图LM比三图LM更好,使用过多的隐藏状态会降低主题分割性能,并且从文本中删除停止词并不能提高分割性能。
{"title":"Using hidden Markov models for topic segmentation of meeting transcripts","authors":"Melissa Sherman, Yang Liu","doi":"10.1109/SLT.2008.4777871","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777871","url":null,"abstract":"In this paper, we present a hidden Markov model (HMM) approach to segment meeting transcripts into topics. To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information. Using modified WinDiff and Pk metrics, we demonstrate that an HMM outperforms LCSeg, a state-of-the-art lexical chain based method for topic segmentation using the ICSI meeting corpus. We evaluate the effect of language model order, the number of hidden states, and the use of stop words. Our experimental results show that a unigram LM is better than a trigram LM, using too many hidden states degrades topic segmentation performance, and that removing the stop words from the transcripts does not improve segmentation performance.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121083529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Global syllable set for building speech synthesis in Indian languages 用于在印度语言中构建语音合成的全局音节集
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777837
E. V. Raghavendra, Srinivas Desai, B. Yegnanarayana, A. Black, K. Prahallad
Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers, whose output is likely to pick units from multiple languages and hence the synthesized utterance contains units spoken by multiple speakers which would annoy the user. We intend to use a cross lingual voice conversion framework using Artificial Neural Networks (ANN) to transform such an utterance to a single target speaker.
印度语言本质上是音节的,许多音节在其语言中是共同的。这促使我们通过组合多个语言音节来构建一个全局音节集,当找不到所需音节时,可以从不同的语言中借用单元。这种合成器利用了不同说话人使用的不同语言的语音数据库,其输出很可能会从多种语言中挑选单位,因此合成的话语中包含了多个说话人使用的单元,这会让用户感到厌烦。我们打算使用一个跨语言语音转换框架,使用人工神经网络(ANN)将这样的话语转换为单个目标说话者。
{"title":"Global syllable set for building speech synthesis in Indian languages","authors":"E. V. Raghavendra, Srinivas Desai, B. Yegnanarayana, A. Black, K. Prahallad","doi":"10.1109/SLT.2008.4777837","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777837","url":null,"abstract":"Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers, whose output is likely to pick units from multiple languages and hence the synthesized utterance contains units spoken by multiple speakers which would annoy the user. We intend to use a cross lingual voice conversion framework using Artificial Neural Networks (ANN) to transform such an utterance to a single target speaker.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127505157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Latent semantic retrieval of spoken documents over position specific posterior lattices 基于位置特异性后晶格的口语文件潜在语义检索
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777896
Hung-lin Chang, Yi-Cheng Pan, Lin-Shan Lee
This paper presents a new approach of latent semantic retrieval of spoken documents over Position Specific Posterior Lattices (PSPL). This approach performs concept matching instead of literal term matching during retrieval based on the Probabilistic Latent Semantic Analysis (PLSA), so as to solve the problem of term mismatch between the query and the desired spoken documents. This approach is performed over PSPL to consider the multiple hypotheses generated by ASR process, as well as the position information for these hypotheses, so as to alleviate the problem of relatively poor ASR accuracy. We establish a framework to evaluate semantic relevance between terms and the relevance score between a query and a PSPL, both based on the latent topic information from PLSA. Preliminary experiments on Chinese broadcast news segments showed significant improvements can be obtained with the proposed approach.
提出了一种基于位置特定后验格(PSPL)的口语文档潜在语义检索新方法。该方法基于概率潜在语义分析(Probabilistic Latent Semantic Analysis, PLSA),在检索过程中进行概念匹配,而不是字面词匹配,从而解决查询与期望的口语文档词不匹配的问题。该方法在PSPL上进行,考虑了ASR过程产生的多个假设,以及这些假设的位置信息,以缓解ASR精度相对较差的问题。我们建立了一个框架来评估术语之间的语义相关性以及查询与PSPL之间的相关性评分,这两个框架都基于来自PLSA的潜在主题信息。对中国广播新闻片段的初步实验表明,该方法可以取得显著的改进。
{"title":"Latent semantic retrieval of spoken documents over position specific posterior lattices","authors":"Hung-lin Chang, Yi-Cheng Pan, Lin-Shan Lee","doi":"10.1109/SLT.2008.4777896","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777896","url":null,"abstract":"This paper presents a new approach of latent semantic retrieval of spoken documents over Position Specific Posterior Lattices (PSPL). This approach performs concept matching instead of literal term matching during retrieval based on the Probabilistic Latent Semantic Analysis (PLSA), so as to solve the problem of term mismatch between the query and the desired spoken documents. This approach is performed over PSPL to consider the multiple hypotheses generated by ASR process, as well as the position information for these hypotheses, so as to alleviate the problem of relatively poor ASR accuracy. We establish a framework to evaluate semantic relevance between terms and the relevance score between a query and a PSPL, both based on the latent topic information from PLSA. Preliminary experiments on Chinese broadcast news segments showed significant improvements can be obtained with the proposed approach.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Experience with developing and deploying an agricultural information system using spoken language technology in Kenya 在肯尼亚使用口语技术开发和部署农业信息系统的经验
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777829
R. Tucker, M. Gakuru
We describe the progress of the Local Language Speech Technology Initiative in Kenya, where since starting in 2003, technology and expertise have been successfully transferred to the Kenyan partners, culminating in the launch of the National Farmers Information Service (NAFIS) in April 2008. NAFIS is primarily a voice service accessed over the phone and offers a wide range of information in Kiswahili or Kenyan English, supplementing the existing agricultural extension services.
我们描述了肯尼亚当地语言语音技术倡议的进展,自2003年启动以来,技术和专业知识已成功转让给肯尼亚合作伙伴,最终于2008年4月启动了国家农民信息服务(NAFIS)。NAFIS主要是一种通过电话访问的语音服务,以斯瓦希里语或肯尼亚英语提供广泛的信息,补充现有的农业推广服务。
{"title":"Experience with developing and deploying an agricultural information system using spoken language technology in Kenya","authors":"R. Tucker, M. Gakuru","doi":"10.1109/SLT.2008.4777829","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777829","url":null,"abstract":"We describe the progress of the Local Language Speech Technology Initiative in Kenya, where since starting in 2003, technology and expertise have been successfully transferred to the Kenyan partners, culminating in the launch of the National Farmers Information Service (NAFIS) in April 2008. NAFIS is primarily a voice service accessed over the phone and offers a wide range of information in Kiswahili or Kenyan English, supplementing the existing agricultural extension services.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115072507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modelling multimodal user ID in dialogue 在对话中建模多模态用户ID
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777853
H. Holzapfel, A. Waibel
This paper presents an approach to model user ID in dialogue. A belief network is used to integrate ID classifiers, such as face ID and voice ID, and person related information, such as the first name and last name of a person from speech recognition or spelling. Different network structures are analyzed and compared with each other and are compared with a rule-based user model. The approach is evaluated on dialogue data collected in a person identification scenario, which includes both, identification of known persons and interactive learning of names and ID of unknown persons.
本文提出了一种对对话中的用户ID进行建模的方法。使用信念网络将身份分类器(如人脸识别和语音识别)与人相关信息(如语音识别或拼写中的人的姓和名)集成在一起。对不同的网络结构进行了分析和比较,并与基于规则的用户模型进行了比较。该方法在人员识别场景中收集的对话数据上进行了评估,该场景包括已知人员的识别和未知人员姓名和身份的交互式学习。
{"title":"Modelling multimodal user ID in dialogue","authors":"H. Holzapfel, A. Waibel","doi":"10.1109/SLT.2008.4777853","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777853","url":null,"abstract":"This paper presents an approach to model user ID in dialogue. A belief network is used to integrate ID classifiers, such as face ID and voice ID, and person related information, such as the first name and last name of a person from speech recognition or spelling. Different network structures are analyzed and compared with each other and are compared with a rule-based user model. The approach is evaluated on dialogue data collected in a person identification scenario, which includes both, identification of known persons and interactive learning of names and ID of unknown persons.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Identifying salient utterances of online spoken documents using descriptive hypertext 使用描述性超文本识别在线口语文档的显著话语
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777868
Xiao-Dan Zhu, Siavash Kazemian, Gerald Penn
The Internet has become an important supply channel of spoken documents. Efficient ways of navigating their content are highly desirable. This paper aims to identify the most salient utterances from online spoken documents using relevant hypertext that encapsulates key information. Experimental results show that hypertext features are helpful when properly utilized and if the bit rates used to compress the spoken documents are reasonable.
互联网已成为口头文件的重要供应渠道。高效的内容导航方式是非常可取的。本文旨在利用包含关键信息的相关超文本从在线口语文档中识别出最突出的话语。实验结果表明,如果使用合理的比特率压缩语音文档,超文本特征是有帮助的。
{"title":"Identifying salient utterances of online spoken documents using descriptive hypertext","authors":"Xiao-Dan Zhu, Siavash Kazemian, Gerald Penn","doi":"10.1109/SLT.2008.4777868","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777868","url":null,"abstract":"The Internet has become an important supply channel of spoken documents. Efficient ways of navigating their content are highly desirable. This paper aims to identify the most salient utterances from online spoken documents using relevant hypertext that encapsulates key information. Experimental results show that hypertext features are helpful when properly utilized and if the bit rates used to compress the spoken documents are reasonable.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2008 IEEE Spoken Language Technology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1