Visual co-occurrence network: using context for large-scale object recognition in retail

2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia) Pub Date : 2015-12-17 DOI:10.1109/ESTIMedia.2015.7351774

Siddharth Advani, Brigid Smith, Yasuki Tanabe, K. Irick, M. Cotter, J. Sampson, N. Vijaykrishnan

{"title":"Visual co-occurrence network: using context for large-scale object recognition in retail","authors":"Siddharth Advani, Brigid Smith, Yasuki Tanabe, K. Irick, M. Cotter, J. Sampson, N. Vijaykrishnan","doi":"10.1109/ESTIMedia.2015.7351774","DOIUrl":null,"url":null,"abstract":"In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.","PeriodicalId":350361,"journal":{"name":"2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTIMedia.2015.7351774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视觉共现网络:基于上下文的零售业大规模目标识别

在任何视觉对象识别系统中，分类精度很可能决定整个系统的有用性。在许多现实世界的应用程序中，能够识别大量不同的对象也很重要，因为系统要足够健壮，才能处理人类视觉系统每天处理的那种任务。这些目标通常与性能不一致，因为在任何一个场景中运行太多的检测器对于任何实时场景的使用来说都是非常慢的。但是，视觉信息具有时间和空间上下文，可以利用这些上下文来减少在任何给定实例中需要触发的检测器的数量。在本文中，我们提出了一种动态方法来编码这种上下文，称为视觉共现网络(ViCoNet)，它建立了视觉场景中观察到的物体之间的关系。我们研究了ViCoNet在集成到针对零售购物的视觉管道时的效用。当在一个大而深入的数据集上进行评估时，我们在最佳情况下实现了50%的性能提高和7%的准确性提高，在既定基线的平均情况下实现了45%的性能提高和3%的准确性提高。ViCoNet的内存开销在10KB左右，突出了它在时态大数据上的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 13th IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

自引率

0.00%

发文量