Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing

Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, Shipeng Li
{"title":"Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing","authors":"Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, Shipeng Li","doi":"10.1145/2502081.2502084","DOIUrl":null,"url":null,"abstract":"Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience---the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2502081.2502084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience---the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
听,看,和抓到你:即时视频搜索与移动电话通过分层音视频索引
移动视频正迅速成为一种大众消费现象。越来越多的人在移动中使用智能手机搜索和浏览视频内容。在本文中,我们开发了一个创新的即时移动视频搜索系统,通过该系统,用户可以通过简单地将手机指向屏幕来捕捉他们正在观看的视频的几秒钟来发现视频。该系统能够在云中使用新的分层音视频索引方法对大规模视频数据进行索引,并实时提取轻量级的联合音视频签名,并在移动设备上进行渐进式搜索。与大多数现有的移动视频搜索应用程序只是将原始视频查询发送到云端不同,该提议的移动系统是利用移动设备的轻量级计算能力进行即时和渐进式视频搜索的首次尝试之一。该系统具有四个独特的特性:1)联合音视频签名,以应对手机捕获的查询视频所带来的巨大的视听差异;2)分层音视频索引,以整体利用音视频信号的互补性;3)轻量级指纹识别,以适应移动处理能力;4)渐进式查询过程,大大降低了计算成本,改善了用户体验——一旦获得确信的结果,搜索过程可以随时停止。我们从600小时的视频数据集中收集了由25个移动用户捕获的1400个查询视频。实验表明,当查询视频小于10秒时,系统的准确率达到90.79%,当查询视频小于5秒时,系统的准确率达到70.07%,优于现有的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Summary abstract for the 1st ACM international workshop on personal data meets distributed multimedia πLDA: document clustering with selective structural constraints Massive-scale multimedia semantic modeling OTMedia: the French TransMedia news observatory Orchestration: tv-like mixing grammars applied to video-communication for social groups
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1