语音多媒体内容管理技术

First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004. Pub Date : 2004-04-19 DOI:10.1109/CCNC.2004.1286891

Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata

{"title":"语音多媒体内容管理技术","authors":"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata","doi":"10.1109/CCNC.2004.1286891","DOIUrl":null,"url":null,"abstract":"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.","PeriodicalId":316094,"journal":{"name":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech technology for multimedia content management\",\"authors\":\"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata\",\"doi\":\"10.1109/CCNC.2004.1286891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.\",\"PeriodicalId\":316094,\"journal\":{\"name\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCNC.2004.1286891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCNC.2004.1286891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

除非生成描述多媒体内容的元数据，否则无法有效地检索多媒体内容。然而，元数据生成往往既耗时又昂贵，因为它通常需要人工浏览内容并手动标记它。本文展示了如何使用自动语音识别(ASR)技术来进行元数据生成，而大大减少了人力支出。本文描述了两种不同的方法:语音标记，即人类标记数据，但这一过程通过将ASR应用于标记过程而加快;音频索引，通过对内容本身应用ASR，大部分标记过程都是自动化的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Speech technology for multimedia content management

Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.

自引率

0.00%

发文量

期刊最新文献

Quality and reliability of GPRS connections Improving the performance of TCP in structure-less networks with virtual infrastructures 802.11 over coax - a hybrid coax-wireless home network using 802.11 technology Message-initiated constraint-based routing for wireless ad-hoc sensor networks Zero-padded OFDM with improved performance over multipath channels