Visual co-occurrence network: using context for large-scale object recognition in retail

Siddharth Advani, Brigid Smith, Yasuki Tanabe, K. Irick, M. Cotter, J. Sampson, N. Vijaykrishnan
{"title":"Visual co-occurrence network: using context for large-scale object recognition in retail","authors":"Siddharth Advani, Brigid Smith, Yasuki Tanabe, K. Irick, M. Cotter, J. Sampson, N. Vijaykrishnan","doi":"10.1109/ESTIMedia.2015.7351774","DOIUrl":null,"url":null,"abstract":"In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.","PeriodicalId":350361,"journal":{"name":"2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTIMedia.2015.7351774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视觉共现网络:基于上下文的零售业大规模目标识别
在任何视觉对象识别系统中,分类精度很可能决定整个系统的有用性。在许多现实世界的应用程序中,能够识别大量不同的对象也很重要,因为系统要足够健壮,才能处理人类视觉系统每天处理的那种任务。这些目标通常与性能不一致,因为在任何一个场景中运行太多的检测器对于任何实时场景的使用来说都是非常慢的。但是,视觉信息具有时间和空间上下文,可以利用这些上下文来减少在任何给定实例中需要触发的检测器的数量。在本文中,我们提出了一种动态方法来编码这种上下文,称为视觉共现网络(ViCoNet),它建立了视觉场景中观察到的物体之间的关系。我们研究了ViCoNet在集成到针对零售购物的视觉管道时的效用。当在一个大而深入的数据集上进行评估时,我们在最佳情况下实现了50%的性能提高和7%的准确性提高,在既定基线的平均情况下实现了45%的性能提高和3%的准确性提高。ViCoNet的内存开销在10KB左右,突出了它在时态大数据上的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Quasi-static scheduling of data flow graphs in the presence of limited channel capacities Bio-inspired distributed task remapping for multiple video stream decoding on homogeneous NoCs Memory-aware cooperative CPU-GPU DVFS governor for mobile games Adaptive multi-resource end-to-end reservations for component-based distributed real-time systems Framework separated migration for web applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1