Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI:10.1109/WACV.2019.00110

M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle

{"title":"Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention","authors":"M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle","doi":"10.1109/WACV.2019.00110","DOIUrl":null,"url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the \"what\" of the signal - and extracting location related information — the \"where\" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by \"what\" information, while the dorsal pathway, into the parietal lobe, is dominated by \"where\" information. Inspired by this structure, we propose the integration of a \"Ventral Network\" and a \"Dorsal Network\", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the "what" of the signal - and extracting location related information — the "where" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by "what" information, while the dorsal pathway, into the parietal lobe, is dominated by "where" information. Inspired by this structure, we propose the integration of a "Ventral Network" and a "Dorsal Network", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

腹背神经网络:通过选择性注意进行目标检测

深度卷积神经网络(cnn)已被多次证明在图像分类任务中表现良好。然而，目标检测方法仍然需要重大的改进。在本文中，我们提出了一个新的框架，称为腹-背网络(VDNets)，它的灵感来自于人类视觉系统的结构。粗略地说，视觉输入信号沿着两个独立的神经流进行分析，一个在颞叶，另一个在顶叶。这些流之间的粗略功能区别在于对象识别(信号的“内容”)和提取位置相关信息(信号的“位置”)。从初级视觉皮层进入颞叶的腹侧通路主要由“什么”信息主导，而进入顶叶的背侧通路主要由“在哪里”信息主导。受这种结构的启发，我们提出了“腹侧网络”和“背侧网络”的整合，这是互补的。物体身份信息可以指导定位，位置信息可以引导注意力到相关图像区域，提高物体识别。这种新的双网络框架使目标检测的重点更加突出。我们的实验结果表明，该方法在PASCAL VOC 2007 (mAP)和PASCAL VOC 2012 (mAP)上的性能分别比目前最先进的目标检测方法高出8%和3%。此外，年鉴图像技术的比较显示了VDNet在质量和数量上的巨大优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量