语音多媒体内容管理技术

Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata
{"title":"语音多媒体内容管理技术","authors":"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata","doi":"10.1109/CCNC.2004.1286891","DOIUrl":null,"url":null,"abstract":"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.","PeriodicalId":316094,"journal":{"name":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech technology for multimedia content management\",\"authors\":\"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata\",\"doi\":\"10.1109/CCNC.2004.1286891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.\",\"PeriodicalId\":316094,\"journal\":{\"name\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCNC.2004.1286891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCNC.2004.1286891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

除非生成描述多媒体内容的元数据,否则无法有效地检索多媒体内容。然而,元数据生成往往既耗时又昂贵,因为它通常需要人工浏览内容并手动标记它。本文展示了如何使用自动语音识别(ASR)技术来进行元数据生成,而大大减少了人力支出。本文描述了两种不同的方法:语音标记,即人类标记数据,但这一过程通过将ASR应用于标记过程而加快;音频索引,通过对内容本身应用ASR,大部分标记过程都是自动化的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Speech technology for multimedia content management
Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Quality and reliability of GPRS connections Improving the performance of TCP in structure-less networks with virtual infrastructures 802.11 over coax - a hybrid coax-wireless home network using 802.11 technology Message-initiated constraint-based routing for wireless ad-hoc sensor networks Zero-padded OFDM with improved performance over multipath channels
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1