如何破坏一个美好的海滩你唱平静的香

H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa
{"title":"如何破坏一个美好的海滩你唱平静的香","authors":"H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa","doi":"10.1145/1040830.1040898","DOIUrl":null,"url":null,"abstract":"A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the \"best\" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that \"don't make sense\" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.","PeriodicalId":376409,"journal":{"name":"Proceedings of the 10th international conference on Intelligent user interfaces","volume":"211 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":"{\"title\":\"How to wreck a nice beach you sing calm incense\",\"authors\":\"H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa\",\"doi\":\"10.1145/1040830.1040898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the \\\"best\\\" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that \\\"don't make sense\\\" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.\",\"PeriodicalId\":376409,\"journal\":{\"name\":\"Proceedings of the 10th international conference on Intelligent user interfaces\",\"volume\":\"211 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"61\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th international conference on Intelligent user interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1040830.1040898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th international conference on Intelligent user interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1040830.1040898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61

摘要

语音识别的一个主要问题是区分听起来相似但含义不同的单词和短语。语音识别程序为给定的音频片段生成加权候选假设列表,并选择“最佳”候选。如果选择不正确,用户必须调用一个显示假设列表的修正界面,并选择所需的一个。校正界面非常耗时,也是当今听写系统令人沮丧的主要原因。传统的听写系统基于统计技术(如n-grams和隐马尔可夫模型)衍生的语言模型来优先考虑假设。提出了一种基于常识知识的假设排序补充方法。我们过滤声学和词频假设,通过测试其合理性的语义网络,从70万个日常生活的陈述。这通常会过滤掉从用户角度看“没有意义”的可能性,从而提高识别能力。以这种方式减少假设空间也使简化的校正接口成为可能,从而提高听写系统的总体吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How to wreck a nice beach you sing calm incense
A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that "don't make sense" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ClaimSpotter: an environment to support sensemaking with knowledge triples Person-independent estimation of emotional experiences from facial expressions Interaction with embodied conversational agents User intentions funneled through a human-robot interface Interfaces for networked media exploration and collaborative annotation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1