首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Data-free Universal Adversarial Perturbation and Black-box Attack 无数据通用对抗性摄动和黑盒攻击
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00777
Chaoning Zhang, Philipp Benz, Adil Karjauv, In-So Kweon
Universal adversarial perturbation (UAP), i.e. a single perturbation to fool the network for most images, is widely recognized as a more practical attack because the UAP can be generated beforehand and applied directly during the at-tack stage. One intriguing phenomenon regarding untargeted UAP is that most images are misclassified to a dominant label. This phenomenon has been reported in previous works while lacking a justified explanation, for which our work attempts to provide an alternative explanation. For a more practical universal attack, our investigation of untargeted UAP focuses on alleviating the dependence on the original training samples, from removing the need for sample labels to limiting the sample size. Towards strictly data-free untargeted UAP, our work proposes to exploit artificial Jigsaw images as the training samples, demonstrating competitive performance. We further investigate the possibility of exploiting the UAP for a data-free black-box attack which is arguably the most practical yet challenging threat model. We demonstrate that there exists optimization-free repetitive patterns which can successfully attack deep models. Code is available at https://bit.ly/3y0ZTIC.
普遍对抗摄动(UAP),即对大多数图像进行单一扰动来欺骗网络,被广泛认为是一种更实用的攻击,因为UAP可以事先生成并在攻击阶段直接应用。关于非目标UAP,一个有趣的现象是,大多数图像被错误地分类到一个主导标签。这种现象在以前的作品中已经报道过,但缺乏合理的解释,为此我们的工作试图提供另一种解释。对于更实际的通用攻击,我们对非目标UAP的调查侧重于减轻对原始训练样本的依赖,从消除对样本标签的需要到限制样本大小。对于严格无数据的非目标UAP,我们的工作提出利用人工拼图图像作为训练样本,展示竞争性能。我们进一步研究了利用UAP进行无数据黑盒攻击的可能性,这可以说是最实用但最具挑战性的威胁模型。我们证明了存在可以成功攻击深度模型的无优化重复模式。代码可从https://bit.ly/3y0ZTIC获得。
{"title":"Data-free Universal Adversarial Perturbation and Black-box Attack","authors":"Chaoning Zhang, Philipp Benz, Adil Karjauv, In-So Kweon","doi":"10.1109/ICCV48922.2021.00777","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00777","url":null,"abstract":"Universal adversarial perturbation (UAP), i.e. a single perturbation to fool the network for most images, is widely recognized as a more practical attack because the UAP can be generated beforehand and applied directly during the at-tack stage. One intriguing phenomenon regarding untargeted UAP is that most images are misclassified to a dominant label. This phenomenon has been reported in previous works while lacking a justified explanation, for which our work attempts to provide an alternative explanation. For a more practical universal attack, our investigation of untargeted UAP focuses on alleviating the dependence on the original training samples, from removing the need for sample labels to limiting the sample size. Towards strictly data-free untargeted UAP, our work proposes to exploit artificial Jigsaw images as the training samples, demonstrating competitive performance. We further investigate the possibility of exploiting the UAP for a data-free black-box attack which is arguably the most practical yet challenging threat model. We demonstrate that there exists optimization-free repetitive patterns which can successfully attack deep models. Code is available at https://bit.ly/3y0ZTIC.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"7848-7857"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82303378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation 面向测地线感知的三维语义分割的体素网格网络
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01520
Zeyu Hu, Xuyang Bai, Jiaxiang Shang, Runze Zhang, Jiayu Dong, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai
In recent years, sparse voxel-based methods have be-come the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters). Code release: https://github.com/hzykent/VMNet
近年来,基于稀疏体素的方法已经成为室内场景三维语义分割的最先进方法,这要归功于强大的3D cnn。然而,由于忽略了底层几何结构,基于体素的方法在空间接近的物体上存在模糊特征,并且由于缺乏测地线信息而难以处理复杂和不规则的几何结构。鉴于此,我们提出了体素网格网络(VMNet),这是一种新颖的3D深度架构,利用欧几里得和测地信息对体素和网格表示进行操作。直观地说,从体素中提取的欧几里得信息可以提供表示附近物体之间相互作用的上下文线索,而从网格中提取的测地线信息可以帮助分离空间上接近但表面不相连的物体。为了融合这两个领域的信息,我们设计了一个域内关注模块用于有效的特征聚合,一个域间关注模块用于自适应特征融合。实验结果验证了VMNet的有效性:具体来说,在具有挑战性的ScanNet数据集上进行室内场景的大规模分割,它以更简单的网络结构(17M对30M和38M参数)优于最先进的SparseConvNet和MinkowskiNet(74.6%对72.5%和73.6%)。代码发布:https://github.com/hzykent/VMNet
{"title":"VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation","authors":"Zeyu Hu, Xuyang Bai, Jiaxiang Shang, Runze Zhang, Jiayu Dong, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai","doi":"10.1109/ICCV48922.2021.01520","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01520","url":null,"abstract":"In recent years, sparse voxel-based methods have be-come the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters). Code release: https://github.com/hzykent/VMNet","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"3 1","pages":"15468-15478"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82466066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection 基于正交正切规则的多任务AET暗目标检测
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00255
Ziteng Cui, Guo-Jun Qi, Lin Gu, Shaodi You, Zenghui Zhang, T. Harada
Dark environment becomes a challenge for computer vision algorithms owing to insufficient photons and undesirable noise. To enhance object detection in a dark environment, we propose a novel multitask auto encoding transformation (MAET) model which is able to explore the intrinsic pattern behind illumination translation. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation considering the physical noise model and image signal processing (ISP). Based on this representation, we achieve the object detection task by decoding the bounding box coordinates and classes. To avoid the over-entanglement of two tasks, our MAET disentangles the object and degrading features by imposing an orthogonal tangent regularity. This forms a parametric manifold along which multitask predictions can be geometrically formulated by maximizing the orthogonality between the tangents along the outputs of respective tasks. Our framework can be implemented based on the mainstream object detection architecture and directly trained end-to-end using normal target detection datasets, such as VOC and COCO. We have achieved the state-of-the-art performance using synthetic and real-world datasets. Codes will be released at https://github.com/cuiziteng/MAET.
黑暗环境由于光子不足和噪声的影响,对计算机视觉算法提出了挑战。为了增强在黑暗环境下的目标检测,我们提出了一种新的多任务自动编码变换(MAET)模型,该模型能够探索照明转换背后的内在模式。MAET以一种自我监督的方式,考虑物理噪声模型和图像信号处理(ISP),通过对现实光照退化变换进行编码和解码来学习内在视觉结构。基于这种表示,我们通过解码边界框坐标和类来实现目标检测任务。为了避免两个任务的过度纠缠,我们的MAET通过施加正交切线规则来解除对象的纠缠并降低特征。这形成了一个参数流形,沿着这个流形,可以通过最大化沿各自任务输出的切线之间的正交性来几何地表示多任务预测。我们的框架可以基于主流的目标检测架构来实现,并直接使用正常的目标检测数据集(如VOC和COCO)进行端到端训练。我们已经使用合成和真实世界的数据集实现了最先进的性能。代码将在https://github.com/cuiziteng/MAET上发布。
{"title":"Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection","authors":"Ziteng Cui, Guo-Jun Qi, Lin Gu, Shaodi You, Zenghui Zhang, T. Harada","doi":"10.1109/ICCV48922.2021.00255","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00255","url":null,"abstract":"Dark environment becomes a challenge for computer vision algorithms owing to insufficient photons and undesirable noise. To enhance object detection in a dark environment, we propose a novel multitask auto encoding transformation (MAET) model which is able to explore the intrinsic pattern behind illumination translation. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation considering the physical noise model and image signal processing (ISP). Based on this representation, we achieve the object detection task by decoding the bounding box coordinates and classes. To avoid the over-entanglement of two tasks, our MAET disentangles the object and degrading features by imposing an orthogonal tangent regularity. This forms a parametric manifold along which multitask predictions can be geometrically formulated by maximizing the orthogonality between the tangents along the outputs of respective tasks. Our framework can be implemented based on the mainstream object detection architecture and directly trained end-to-end using normal target detection datasets, such as VOC and COCO. We have achieved the state-of-the-art performance using synthetic and real-world datasets. Codes will be released at https://github.com/cuiziteng/MAET.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"2533-2542"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87106535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
CrossDet: Crossline Representation for Object Detection CrossDet:用于对象检测的交叉线表示
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00318
Heqian Qiu, Hongliang Li, Qingbo Wu, Jianhua Cui, Zichen Song, Lanxiao Wang, Minjian Zhang
Object detection aims to accurately locate and classify objects in an image, which requires precise object representations. Existing methods usually use rectangular anchor boxes or a set of points to represent objects. However, these methods either introduce background noise or miss the continuous appearance information inside the object, and thus cause incorrect detection results. In this paper, we propose a novel anchor-free object detection network, called Cross-Det, which uses a set of growing cross lines along horizontal and vertical axes as object representations. An object can be flexibly represented as cross lines in different combinations. It not only can effectively reduce the interference of noise, but also take into account the continuous object information, which is useful to enhance the discriminability of object features and find the object boundaries. Based on the learned cross lines, we propose a crossline extraction module to adaptively capture features of cross lines. Furthermore, we design a decoupled regression mechanism to regress the localization along the horizontal and vertical directions respectively, which helps to decrease the optimization difficulty because the optimization space is limited to a specific direction. Our method achieves consistently improvement on the PASCAL VOC and MS-COCO datasets. The experiment results demonstrate the effectiveness of our proposed method. Code can be available at: https://github.com/QiuHeqian/CrossDet.
目标检测的目的是准确定位和分类图像中的目标,这需要精确的目标表示。现有的方法通常使用矩形锚框或一组点来表示对象。然而,这些方法要么引入了背景噪声,要么忽略了物体内部的连续外观信息,从而导致不正确的检测结果。在本文中,我们提出了一种新的无锚点目标检测网络,称为cross - det,它使用一组沿水平和垂直轴生长的交叉线作为目标表示。一个对象可以灵活地表现为不同组合的交叉线。它不仅可以有效地降低噪声的干扰,而且考虑了连续的目标信息,有助于增强目标特征的可分辨性和寻找目标边界。基于学习到的交叉线,我们提出了一个交叉线提取模块来自适应捕获交叉线的特征。此外,我们设计了一种解耦回归机制,分别沿水平和垂直方向对定位进行回归,这有助于降低优化难度,因为优化空间被限制在一个特定的方向上。我们的方法在PASCAL VOC和MS-COCO数据集上实现了持续的改进。实验结果证明了该方法的有效性。代码可以在https://github.com/QiuHeqian/CrossDet上获得。
{"title":"CrossDet: Crossline Representation for Object Detection","authors":"Heqian Qiu, Hongliang Li, Qingbo Wu, Jianhua Cui, Zichen Song, Lanxiao Wang, Minjian Zhang","doi":"10.1109/ICCV48922.2021.00318","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00318","url":null,"abstract":"Object detection aims to accurately locate and classify objects in an image, which requires precise object representations. Existing methods usually use rectangular anchor boxes or a set of points to represent objects. However, these methods either introduce background noise or miss the continuous appearance information inside the object, and thus cause incorrect detection results. In this paper, we propose a novel anchor-free object detection network, called Cross-Det, which uses a set of growing cross lines along horizontal and vertical axes as object representations. An object can be flexibly represented as cross lines in different combinations. It not only can effectively reduce the interference of noise, but also take into account the continuous object information, which is useful to enhance the discriminability of object features and find the object boundaries. Based on the learned cross lines, we propose a crossline extraction module to adaptively capture features of cross lines. Furthermore, we design a decoupled regression mechanism to regress the localization along the horizontal and vertical directions respectively, which helps to decrease the optimization difficulty because the optimization space is limited to a specific direction. Our method achieves consistently improvement on the PASCAL VOC and MS-COCO datasets. The experiment results demonstrate the effectiveness of our proposed method. Code can be available at: https://github.com/QiuHeqian/CrossDet.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"72 1","pages":"3175-3184"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86138290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Explainable Video Entailment with Grounded Visual Evidence 可解释的视频蕴涵与扎实的视觉证据
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00203
Junwen Chen, Yu Kong Golisano
Video entailment aims at determining if a hypothesis textual statement is entailed or contradicted by a premise video. The main challenge of video entailment is that it requires fine-grained reasoning to understand the complex and long story-based videos. To this end, we propose to incorporate visual grounding to the entailment by explicitly linking the entities described in the statement to the evidence in the video. If the entities are grounded in the video, we enhance the entailment judgment by focusing on the frames where the entities occur. Besides, in the entailment dataset, the entailed/contradictory (also named as real/fake) statements are formed in pairs with subtle discrepancy, which allows an add-on explanation module to predict which words or phrases make the statement contradictory to the video and regularize the training of the entailment judgment. Experimental results demonstrate that our approach outperforms the state-of-the-art methods.
视频蕴涵的目的是确定假设文本陈述是否包含或与前提视频相矛盾。视频蕴意的主要挑战是,它需要细粒度的推理来理解复杂和冗长的基于故事的视频。为此,我们建议通过明确地将陈述中描述的实体与视频中的证据联系起来,将视觉基础纳入蕴涵。如果实体是基于视频的,我们通过关注实体出现的帧来增强蕴涵判断。此外,在蕴涵数据集中,蕴涵/矛盾(也称为真实/虚假)语句以微妙的差异成对形成,这允许附加的解释模块预测哪些单词或短语使语句与视频相矛盾,并规范蕴涵判断的训练。实验结果表明,我们的方法优于最先进的方法。
{"title":"Explainable Video Entailment with Grounded Visual Evidence","authors":"Junwen Chen, Yu Kong Golisano","doi":"10.1109/ICCV48922.2021.00203","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00203","url":null,"abstract":"Video entailment aims at determining if a hypothesis textual statement is entailed or contradicted by a premise video. The main challenge of video entailment is that it requires fine-grained reasoning to understand the complex and long story-based videos. To this end, we propose to incorporate visual grounding to the entailment by explicitly linking the entities described in the statement to the evidence in the video. If the entities are grounded in the video, we enhance the entailment judgment by focusing on the frames where the entities occur. Besides, in the entailment dataset, the entailed/contradictory (also named as real/fake) statements are formed in pairs with subtle discrepancy, which allows an add-on explanation module to predict which words or phrases make the statement contradictory to the video and regularize the training of the entailment judgment. Experimental results demonstrate that our approach outperforms the state-of-the-art methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"23 1","pages":"2001-2010"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86294362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adversarial Example Detection Using Latent Neighborhood Graph 基于潜在邻域图的对抗性样本检测
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00759
Ahmed A. Abusnaina, Yuhang Wu, Sunpreet S. Arora, Yizhen Wang, Fei Wang, Hao Yang, David A. Mohaisen
Detection of adversarial examples with high accuracy is critical for the security of deployed deep neural network-based models. We present the first graph-based adversarial detection method that constructs a Latent Neighborhood Graph (LNG) around an input example to determine if the input example is adversarial. Given an input example, selected reference adversarial and benign examples (represented as LNG nodes in Figure 1) are used to capture the local manifold in the vicinity of the input example. The LNG node connectivity parameters are optimized jointly with the parameters of a graph attention network in an end-to-end manner to determine the optimal graph topology for adversarial example detection. The graph attention network is used to determine if the LNG is derived from an adversarial or benign input example. Experimental evaluations on CIFAR-10, STL-10, and ImageNet datasets, using six adversarial attack methods, demonstrate that the proposed method outperforms state-of-the-art adversarial detection methods in white-box and gray-box settings. The proposed method is able to successfully detect adversarial examples crafted with small perturbations using unseen attacks.
高精度的对抗性样本检测对于部署的基于深度神经网络的模型的安全性至关重要。我们提出了第一个基于图的对抗性检测方法,该方法围绕输入示例构建一个潜在邻域图(LNG),以确定输入示例是否对抗性。给定一个输入示例,使用选定的参考对抗和良性示例(在图1中表示为LNG节点)来捕获输入示例附近的局部流形。将LNG节点连通性参数与图注意网络的参数进行端到端联合优化,确定用于对抗样例检测的最优图拓扑。图注意网络用于确定LNG是否来自敌对或良性输入示例。在CIFAR-10、STL-10和ImageNet数据集上使用六种对抗性攻击方法进行的实验评估表明,该方法在白盒和灰盒设置下优于最先进的对抗性检测方法。所提出的方法能够成功地检测到使用不可见攻击的小扰动制作的对抗性示例。
{"title":"Adversarial Example Detection Using Latent Neighborhood Graph","authors":"Ahmed A. Abusnaina, Yuhang Wu, Sunpreet S. Arora, Yizhen Wang, Fei Wang, Hao Yang, David A. Mohaisen","doi":"10.1109/ICCV48922.2021.00759","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00759","url":null,"abstract":"Detection of adversarial examples with high accuracy is critical for the security of deployed deep neural network-based models. We present the first graph-based adversarial detection method that constructs a Latent Neighborhood Graph (LNG) around an input example to determine if the input example is adversarial. Given an input example, selected reference adversarial and benign examples (represented as LNG nodes in Figure 1) are used to capture the local manifold in the vicinity of the input example. The LNG node connectivity parameters are optimized jointly with the parameters of a graph attention network in an end-to-end manner to determine the optimal graph topology for adversarial example detection. The graph attention network is used to determine if the LNG is derived from an adversarial or benign input example. Experimental evaluations on CIFAR-10, STL-10, and ImageNet datasets, using six adversarial attack methods, demonstrate that the proposed method outperforms state-of-the-art adversarial detection methods in white-box and gray-box settings. The proposed method is able to successfully detect adversarial examples crafted with small perturbations using unseen attacks.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"175 1","pages":"7667-7676"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86072823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Event Stream Super-Resolution via Spatiotemporal Constraint Learning 基于时空约束学习的事件流超分辨率
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00444
Siqi Li, Yutong Feng, Yipeng Li, Yu Jiang, C. Zou, Yue Gao
Event cameras are bio-inspired sensors that respond to brightness changes asynchronously and output in the form of event streams instead of frame-based images. They own outstanding advantages compared with traditional cameras: higher temporal resolution, higher dynamic range, and lower power consumption. However, the spatial resolution of existing event cameras is insufficient and challenging to be enhanced at the hardware level while maintaining the asynchronous philosophy of circuit design. Therefore, it is imperative to explore the algorithm of event stream super-resolution, which is a non-trivial task due to the sparsity and strong spatio-temporal correlation of the events from an event camera. In this paper, we propose an end-to-end framework based on spiking neural network for event stream super-resolution, which can generate high-resolution (HR) event stream from the input low-resolution (LR) event stream. A spatiotemporal constraint learning mechanism is proposed to learn the spatial and temporal distributions of the event stream simultaneously. We validate our method on four large-scale datasets and the results show that our method achieves state-of-the-art performance. The satisfying results on two downstream applications, i.e. object classification and image reconstruction, further demonstrate the usability of our method. To prove the application potential of our method, we deploy it on a mobile platform. The high-quality HR event stream generated by our real-time system demonstrates the effectiveness and efficiency of our method.
事件相机是受生物启发的传感器,它对亮度变化做出异步响应,并以事件流的形式输出,而不是基于帧的图像。与传统相机相比,它们具有突出的优势:更高的时间分辨率、更高的动态范围、更低的功耗。然而,现有事件相机的空间分辨率不足,很难在硬件层面进行提升,同时保持电路设计的异步理念。因此,探索事件流超分辨率算法势在必行,而事件相机的事件稀疏性和强时空相关性是一项非常重要的任务。本文提出了一种基于尖峰神经网络的端到端事件流超分辨率框架,该框架可以从输入的低分辨率事件流中生成高分辨率事件流。提出了一种时空约束学习机制来同时学习事件流的时空分布。我们在四个大型数据集上验证了我们的方法,结果表明我们的方法达到了最先进的性能。在目标分类和图像重建两个下游应用中取得了令人满意的结果,进一步证明了该方法的可用性。为了证明该方法的应用潜力,我们将其部署在移动平台上。实时系统生成的高质量人力资源事件流证明了该方法的有效性和高效性。
{"title":"Event Stream Super-Resolution via Spatiotemporal Constraint Learning","authors":"Siqi Li, Yutong Feng, Yipeng Li, Yu Jiang, C. Zou, Yue Gao","doi":"10.1109/ICCV48922.2021.00444","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00444","url":null,"abstract":"Event cameras are bio-inspired sensors that respond to brightness changes asynchronously and output in the form of event streams instead of frame-based images. They own outstanding advantages compared with traditional cameras: higher temporal resolution, higher dynamic range, and lower power consumption. However, the spatial resolution of existing event cameras is insufficient and challenging to be enhanced at the hardware level while maintaining the asynchronous philosophy of circuit design. Therefore, it is imperative to explore the algorithm of event stream super-resolution, which is a non-trivial task due to the sparsity and strong spatio-temporal correlation of the events from an event camera. In this paper, we propose an end-to-end framework based on spiking neural network for event stream super-resolution, which can generate high-resolution (HR) event stream from the input low-resolution (LR) event stream. A spatiotemporal constraint learning mechanism is proposed to learn the spatial and temporal distributions of the event stream simultaneously. We validate our method on four large-scale datasets and the results show that our method achieves state-of-the-art performance. The satisfying results on two downstream applications, i.e. object classification and image reconstruction, further demonstrate the usability of our method. To prove the application potential of our method, we deploy it on a mobile platform. The high-quality HR event stream generated by our real-time system demonstrates the effectiveness and efficiency of our method.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"49 1","pages":"4460-4469"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83673753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection 探索自监督三维目标检测的几何感知对比度和聚类协调
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00328
Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei Zhang, Zhenguo Li, L. Gool, Sun Yat-sen
Current 3D object detection paradigms highly rely on extensive annotation efforts, which makes them not practical in many real-world industrial applications. Inspired by that a human driver can keep accumulating experiences from self-exploring the roads without any tutor’s guidance, we first step forwards to explore a simple yet effective self-supervised learning framework tailored for LiDAR-based 3D object detection. Although the self-supervised pipeline has achieved great success in 2D domain, the characteristic challenges (e.g., complex geometry structure and various 3D object views) encountered in the 3D domain hinder the direct adoption of existing techniques that often contrast the 2D augmented data or cluster single-view features. Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D. First, GCC-3D introduces a Geometric-Aware Contrastive objective to learn spatial-sensitive local structure representation. This objective enforces the spatially close voxels to have high feature similarity. Second, a Pseudo-Instance Clustering harmonization mechanism is proposed to encourage that different views of pseudo-instances should have consistent similarities to clustering prototype centers. This module endows our model semantic discriminative capacity. Extensive experiments demonstrate our GCC-3D achieves significant performance improvement on data-efficient 3D object detection benchmarks (nuScenes and Waymo). Moreover, our GCC-3D framework can achieve state-of-the art performances on all popular 3D object detection benchmarks.
当前的3D目标检测范式高度依赖于大量的注释工作,这使得它们在许多现实世界的工业应用中不实用。受人类驾驶员可以在没有任何导师指导的情况下通过自我探索道路不断积累经验的启发,我们第一步探索了为基于激光雷达的3D物体检测量身定制的简单而有效的自我监督学习框架。尽管自监督管道在2D领域取得了巨大的成功,但在3D领域遇到的特征挑战(例如,复杂的几何结构和各种3D对象视图)阻碍了现有技术的直接采用,这些技术通常与2D增强数据或聚类单视图特征进行对比。本文提出了一种新的自监督3D物体检测框架,该框架无缝集成了几何感知对比度和聚类协调,以提升无监督3D表示学习,称为GCC-3D。首先,GCC-3D引入了一个几何感知对比目标来学习空间敏感的局部结构表示。这一目标使得空间上相近的体素具有较高的特征相似性。其次,提出了一种伪实例聚类协调机制,以鼓励伪实例的不同观点与聚类原型中心具有一致的相似性。该模块赋予了模型语义判别能力。广泛的实验表明,我们的GCC-3D在数据高效的3D物体检测基准(nuScenes和Waymo)上取得了显着的性能改进。此外,我们的GCC-3D框架可以在所有流行的3D物体检测基准上实现最先进的性能。
{"title":"Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection","authors":"Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei Zhang, Zhenguo Li, L. Gool, Sun Yat-sen","doi":"10.1109/ICCV48922.2021.00328","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00328","url":null,"abstract":"Current 3D object detection paradigms highly rely on extensive annotation efforts, which makes them not practical in many real-world industrial applications. Inspired by that a human driver can keep accumulating experiences from self-exploring the roads without any tutor’s guidance, we first step forwards to explore a simple yet effective self-supervised learning framework tailored for LiDAR-based 3D object detection. Although the self-supervised pipeline has achieved great success in 2D domain, the characteristic challenges (e.g., complex geometry structure and various 3D object views) encountered in the 3D domain hinder the direct adoption of existing techniques that often contrast the 2D augmented data or cluster single-view features. Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D. First, GCC-3D introduces a Geometric-Aware Contrastive objective to learn spatial-sensitive local structure representation. This objective enforces the spatially close voxels to have high feature similarity. Second, a Pseudo-Instance Clustering harmonization mechanism is proposed to encourage that different views of pseudo-instances should have consistent similarities to clustering prototype centers. This module endows our model semantic discriminative capacity. Extensive experiments demonstrate our GCC-3D achieves significant performance improvement on data-efficient 3D object detection benchmarks (nuScenes and Waymo). Moreover, our GCC-3D framework can achieve state-of-the art performances on all popular 3D object detection benchmarks.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"94 1","pages":"3273-3282"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83908346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image 单色图像交互双手三维姿态和形状重建
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01116
Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, P. Tan, Cuixia Ma, Hongan Wang
In this paper, we propose a novel deep learning framework to reconstruct 3D hand poses and shapes of two interacting hands from a single color image. Previous methods designed for single hand cannot be easily applied for the two hand scenario because of the heavy inter-hand occlusion and larger solution space. In order to address the occlusion and similar appearance between hands that may confuse the network, we design a hand pose-aware attention module to extract features associated to each individual hand respectively. We then leverage the two hand context presented in interaction to propose a context-aware cascaded refinement that improves the hand pose and shape accuracy of each hand conditioned on the context between interacting hands. Extensive experiments on the main benchmark datasets demonstrate that our method predicts accurate 3D hand pose and shape from single color image, and achieves the state-of-the-art performance. Code is available in project webpage https://baowenz.github.io/Intershape/.
在本文中,我们提出了一种新的深度学习框架,用于从单个彩色图像中重建两只相互作用的手的三维姿势和形状。以往针对单手设计的方法由于手间遮挡较大,求解空间较大,不容易应用于双手场景。为了解决手之间的遮挡和相似外观可能会混淆网络的问题,我们设计了一个手部姿势感知注意力模块,分别提取与每只手相关的特征。然后,我们利用交互中呈现的两只手上下文来提出上下文感知级联改进,该改进基于交互手之间的上下文来提高每只手的手部姿势和形状准确性。在主要的基准数据集上进行的大量实验表明,我们的方法可以从单色图像中准确地预测三维手的姿势和形状,并达到了最先进的性能。代码可在项目网页https://baowenz.github.io/Intershape/中获得。
{"title":"Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image","authors":"Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, P. Tan, Cuixia Ma, Hongan Wang","doi":"10.1109/ICCV48922.2021.01116","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01116","url":null,"abstract":"In this paper, we propose a novel deep learning framework to reconstruct 3D hand poses and shapes of two interacting hands from a single color image. Previous methods designed for single hand cannot be easily applied for the two hand scenario because of the heavy inter-hand occlusion and larger solution space. In order to address the occlusion and similar appearance between hands that may confuse the network, we design a hand pose-aware attention module to extract features associated to each individual hand respectively. We then leverage the two hand context presented in interaction to propose a context-aware cascaded refinement that improves the hand pose and shape accuracy of each hand conditioned on the context between interacting hands. Extensive experiments on the main benchmark datasets demonstrate that our method predicts accurate 3D hand pose and shape from single color image, and achieves the state-of-the-art performance. Code is available in project webpage https://baowenz.github.io/Intershape/.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"71 1","pages":"11334-11343"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83936440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting 低延迟轨迹预测的时空一致性网络
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00195
Shijie Li, Yanying Zhou, Jinhui Yi, Juergen Gall
Trajectory forecasting is a crucial step for autonomous vehicles and mobile robots in order to navigate and interact safely. In order to handle the spatial interactions between objects, graph-based approaches have been proposed. These methods, however, model motion on a frame-to-frame basis and do not provide a strong temporal model. To overcome this limitation, we propose a compact model called Spatial-Temporal Consistency Network (STC-Net). In STC-Net, dilated temporal convolutions are introduced to model long-range dependencies along each trajectory for better temporal modeling while graph convolutions are employed to model the spatial interaction among different trajectories. Furthermore, we propose a feature-wise convolution to generate the predicted trajectories in one pass and refine the forecast trajectories together with the reconstructed observed trajectories. We demonstrate that STC-Net generates spatially and temporally consistent trajectories and outperforms other graph-based methods. Since STC-Net requires only 0.7k parameters and forecasts the future with a latency of only 1.3ms, it advances the state-of-the-art and satisfies the requirements for realistic applications.
轨迹预测是自动驾驶汽车和移动机器人安全导航和交互的关键步骤。为了处理对象间的空间交互,提出了基于图的方法。然而,这些方法在帧到帧的基础上建模运动,并没有提供一个强大的时间模型。为了克服这一限制,我们提出了一个紧凑的时空一致性网络(STC-Net)模型。在STC-Net中,为了更好地进行时间建模,引入了扩展时间卷积来模拟每条轨迹上的远程依赖关系,而使用图卷积来模拟不同轨迹之间的空间相互作用。此外,我们提出了一种特征卷积来一次生成预测轨迹,并将预测轨迹与重建的观测轨迹一起改进。我们证明STC-Net生成空间和时间一致的轨迹,并且优于其他基于图的方法。由于STC-Net只需要0.7k个参数,预测未来的延迟仅为1.3ms,因此它是最先进的,满足了现实应用的要求。
{"title":"Spatial-Temporal Consistency Network for Low-Latency Trajectory Forecasting","authors":"Shijie Li, Yanying Zhou, Jinhui Yi, Juergen Gall","doi":"10.1109/ICCV48922.2021.00195","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00195","url":null,"abstract":"Trajectory forecasting is a crucial step for autonomous vehicles and mobile robots in order to navigate and interact safely. In order to handle the spatial interactions between objects, graph-based approaches have been proposed. These methods, however, model motion on a frame-to-frame basis and do not provide a strong temporal model. To overcome this limitation, we propose a compact model called Spatial-Temporal Consistency Network (STC-Net). In STC-Net, dilated temporal convolutions are introduced to model long-range dependencies along each trajectory for better temporal modeling while graph convolutions are employed to model the spatial interaction among different trajectories. Furthermore, we propose a feature-wise convolution to generate the predicted trajectories in one pass and refine the forecast trajectories together with the reconstructed observed trajectories. We demonstrate that STC-Net generates spatially and temporally consistent trajectories and outperforms other graph-based methods. Since STC-Net requires only 0.7k parameters and forecasts the future with a latency of only 1.3ms, it advances the state-of-the-art and satisfies the requirements for realistic applications.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"95 1","pages":"1920-1929"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83978362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1