Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata
{"title":"语音多媒体内容管理技术","authors":"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata","doi":"10.1109/CCNC.2004.1286891","DOIUrl":null,"url":null,"abstract":"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.","PeriodicalId":316094,"journal":{"name":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech technology for multimedia content management\",\"authors\":\"Patrick Nguyen, David Kryze, R. Kuhn, M. Kobayashi, M. Yasukata\",\"doi\":\"10.1109/CCNC.2004.1286891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.\",\"PeriodicalId\":316094,\"journal\":{\"name\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCNC.2004.1286891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCNC.2004.1286891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech technology for multimedia content management
Multimedia content cannot be retrieved effectively unless metadata describing it is generated. However, metadata generation tends to be time-consuming and expensive, since it typically involves human beings going through the content and manually tagging it. The paper shows how automatic speech recognition (ASR) technology can be used to carry out metadata generation with significantly less expenditure of human effort. The paper describes two different approaches: voice tagging, whereby human beings tag the data but this process is speeded up by applying ASR to the tagging process; audio indexing, whereby much of the tagging process is automated by applying ASR to the content itself.