首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
On Equivariant and Invariant Learning of Object Landmark Representations 物体地标表征的等变与不变学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00975
Zezhou Cheng, Jong-Chyi Su, Subhransu Maji
Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a "hypercolumn" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.
给定一组图像,人类能够通过对实例之间的共享几何结构建模来发现地标。这种几何等方差的思想已被广泛应用于物体地标表示的无监督发现。本文将实例判别和空间判别相结合,提出了一种简单有效的对比学习方法。我们表明,当一个深度网络被训练成对几何和光度变换不变性时,它的中间层会出现对物体地标具有高度预测性的表示。在“超列”中堆叠这些跨层,并使用空间对比学习来投射它们,进一步提高了它们在匹配和少量地标回归任务上的性能。我们还通过对比学习的视角,对现有的等变和不变表征学习方法提出了统一的观点,揭示了所学习的不变性的本质。在里程碑式学习的标准基准以及我们提出的一个新的具有挑战性的基准上进行的实验表明,我们提出的方法超越了现有的最先进的方法。
{"title":"On Equivariant and Invariant Learning of Object Landmark Representations","authors":"Zezhou Cheng, Jong-Chyi Su, Subhransu Maji","doi":"10.1109/ICCV48922.2021.00975","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00975","url":null,"abstract":"Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a \"hypercolumn\" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"9877-9886"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81337686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Dynamic DETR: End-to-End Object Detection with Dynamic Attention 动态DETR:具有动态关注的端到端目标检测
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00298
Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang
In this paper, we present a novel Dynamic DETR (Detection with Transformers) approach by introducing dynamic attentions into both the encoder and decoder stages of DETR to break its two limitations on small feature resolution and slow training convergence. To address the first limitation, which is due to the quadratic computational complexity of the self-attention module in Transformer encoders, we propose a dynamic encoder to approximate the Transformer encoder’s attention mechanism using a convolution-based dynamic encoder with various attention types. Such an encoder can dynamically adjust attentions based on multiple factors such as scale importance, spatial importance, and representation (i.e., feature dimension) importance. To mitigate the second limitation of learning difficulty, we introduce a dynamic decoder by replacing the cross-attention module with a ROI-based dynamic attention in the Transformer decoder. Such a decoder effectively assists Transformers to focus on region of interests from a coarse-to-fine manner and dramatically lowers the learning difficulty, leading to a much faster convergence with fewer training epochs. We conduct a series of experiments to demonstrate our advantages. Our Dynamic DETR significantly reduces the training epochs (by 14×), yet results in a much better performance (by 3.6 on mAP). Meanwhile, in the standard 1× setup with ResNet-50 backbone, we archive a new state-of-the-art performance that further proves the learning effectiveness of the proposed approach.
在本文中,我们提出了一种新的动态DETR(带变压器检测)方法,通过在DETR的编码器和解码器阶段引入动态关注来打破其小特征分辨率和慢训练收敛的两个限制。为了解决第一个限制,这是由于Transformer编码器中自注意模块的二次计算复杂性,我们提出了一个动态编码器,使用基于卷积的动态编码器来近似Transformer编码器的注意机制,并具有各种注意类型。这种编码器可以根据尺度重要性、空间重要性和表征(即特征维度)重要性等多个因素动态调整注意力。为了减轻学习困难的第二个限制,我们引入了一个动态解码器,通过在Transformer解码器中用基于roi的动态注意替换交叉注意模块。这样的解码器有效地帮助变形金刚从粗到精的方式专注于感兴趣的区域,并显著降低了学习难度,以更少的训练次数实现更快的收敛。我们进行了一系列的实验来证明我们的优势。我们的Dynamic DETR显著地减少了训练时间(减少了14倍),但却产生了更好的性能(在mAP上减少了3.6次)。同时,在具有ResNet-50骨干网的标准1x设置中,我们存档了一个新的最先进的性能,进一步证明了所提出方法的学习有效性。
{"title":"Dynamic DETR: End-to-End Object Detection with Dynamic Attention","authors":"Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, Lei Zhang","doi":"10.1109/ICCV48922.2021.00298","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00298","url":null,"abstract":"In this paper, we present a novel Dynamic DETR (Detection with Transformers) approach by introducing dynamic attentions into both the encoder and decoder stages of DETR to break its two limitations on small feature resolution and slow training convergence. To address the first limitation, which is due to the quadratic computational complexity of the self-attention module in Transformer encoders, we propose a dynamic encoder to approximate the Transformer encoder’s attention mechanism using a convolution-based dynamic encoder with various attention types. Such an encoder can dynamically adjust attentions based on multiple factors such as scale importance, spatial importance, and representation (i.e., feature dimension) importance. To mitigate the second limitation of learning difficulty, we introduce a dynamic decoder by replacing the cross-attention module with a ROI-based dynamic attention in the Transformer decoder. Such a decoder effectively assists Transformers to focus on region of interests from a coarse-to-fine manner and dramatically lowers the learning difficulty, leading to a much faster convergence with fewer training epochs. We conduct a series of experiments to demonstrate our advantages. Our Dynamic DETR significantly reduces the training epochs (by 14×), yet results in a much better performance (by 3.6 on mAP). Meanwhile, in the standard 1× setup with ResNet-50 backbone, we archive a new state-of-the-art performance that further proves the learning effectiveness of the proposed approach.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"2968-2977"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84723203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
SketchAA: Abstract Representation for Abstract Sketches SketchAA:抽象草图的抽象表示
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00994
Lan Yang, Kaiyue Pang, Honggang Zhang, Yi-Zhe Song
What makes free-hand sketches appealing for humans lies with its capability as a universal tool to depict the visual world. Such flexibility at human ease, however, introduces abstract renderings that pose unique challenges to computer vision models. In this paper, we propose a purpose-made sketch representation for human sketches. The key intuition is that such representation should be abstract at design, so to accommodate the abstract nature of sketches. This is achieved by interpreting sketch abstraction on two levels: appearance and structure. We abstract sketch structure as a pre-defined coarse-to-fine visual block hierarchy, and average visual features within each block to model appearance abstraction. We then discuss three general strategies on how to exploit feature synergy across different levels of this abstraction hierarchy. The superiority of explicitly abstracting sketch representation is empirically validated on a number of sketch analysis tasks, including sketch recognition, fine-grained sketch-based image retrieval, and generative sketch healing. Our simple design not only yields strong results on all said tasks, but also offers intuitive feature granularity control to tailor for various downstream tasks. Code will be made publicly available.
手绘草图吸引人类的原因在于它作为一种描绘视觉世界的通用工具的能力。然而,这种人类轻松的灵活性引入了抽象渲染,给计算机视觉模型带来了独特的挑战。在本文中,我们提出了一种针对人类草图的有目的的草图表示方法。关键的直觉是,这样的表现应该是抽象的设计,以适应草图的抽象性质。这是通过在两个层面上解释草图抽象来实现的:外观和结构。我们将草图结构抽象为一个预定义的从粗到细的视觉块层次结构,并平均每个块内的视觉特征来建模外观抽象。然后,我们讨论了如何在这个抽象层次的不同级别上利用特性协同的三种一般策略。明确抽象草图表示的优越性在许多草图分析任务上得到了经验验证,包括草图识别、基于细粒度草图的图像检索和生成草图修复。我们简单的设计不仅在所有任务上产生强大的结果,而且还提供直观的特征粒度控制,以定制各种下游任务。代码将公开提供。
{"title":"SketchAA: Abstract Representation for Abstract Sketches","authors":"Lan Yang, Kaiyue Pang, Honggang Zhang, Yi-Zhe Song","doi":"10.1109/ICCV48922.2021.00994","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00994","url":null,"abstract":"What makes free-hand sketches appealing for humans lies with its capability as a universal tool to depict the visual world. Such flexibility at human ease, however, introduces abstract renderings that pose unique challenges to computer vision models. In this paper, we propose a purpose-made sketch representation for human sketches. The key intuition is that such representation should be abstract at design, so to accommodate the abstract nature of sketches. This is achieved by interpreting sketch abstraction on two levels: appearance and structure. We abstract sketch structure as a pre-defined coarse-to-fine visual block hierarchy, and average visual features within each block to model appearance abstraction. We then discuss three general strategies on how to exploit feature synergy across different levels of this abstraction hierarchy. The superiority of explicitly abstracting sketch representation is empirically validated on a number of sketch analysis tasks, including sketch recognition, fine-grained sketch-based image retrieval, and generative sketch healing. Our simple design not only yields strong results on all said tasks, but also offers intuitive feature granularity control to tailor for various downstream tasks. Code will be made publicly available.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"97 1","pages":"10077-10086"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84202876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Pose Invariant Topological Memory for Visual Navigation 面向视觉导航的位姿不变拓扑记忆
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01510
Asuto Taniguchi, Fumihiro Sasaki, R. Yamashina
Planning for visual navigation using topological memory, a memory graph consisting of nodes and edges, has been recently well-studied. The nodes correspond to past observations of a robot, and the edges represent the reachability predicted by a neural network (NN). Most prior methods, however, often fail to predict the reachability when the robot takes different poses, i.e. the direction the robot faces, at close positions. This is because the methods observe first-person view images, which significantly changes when the robot changes its pose, and thus it is fundamentally difficult to correctly predict the reachability from them. In this paper, we propose pose invariant topological memory (POINT) to address the problem. POINT observes omnidirectional images and predicts the reachability by using a spherical convolutional NN, which has a rotation invariance property and enables planning regardless of the robot’s pose. Additionally, we train the NN by contrastive learning with data augmentation to enable POINT to plan with robustness to changes in environmental conditions, such as light conditions and the presence of unseen objects. Our experimental results show that POINT outperforms conventional methods under both the same and different environmental conditions. In addition, the results with the KITTI-360 dataset show that POINT is more applicable to real-world environments than conventional methods.
利用拓扑记忆(一种由节点和边组成的记忆图)规划视觉导航,最近得到了很好的研究。节点对应机器人过去的观察结果,边缘代表神经网络(NN)预测的可达性。然而,大多数先前的方法往往无法预测机器人在接近位置时采取不同姿势(即机器人面对的方向)时的可达性。这是因为该方法观察的是第一人称视角图像,当机器人改变姿态时,第一人称视角图像会发生显著变化,因此从根本上难以正确预测机器人的可达性。在本文中,我们提出了位姿不变拓扑记忆(POINT)来解决这个问题。POINT观察全向图像,并使用球面卷积神经网络预测可达性,该神经网络具有旋转不变性,无论机器人的姿势如何,都可以进行规划。此外,我们通过对比学习和数据增强来训练神经网络,使POINT能够对环境条件的变化进行鲁棒性规划,例如光照条件和不可见物体的存在。实验结果表明,无论在相同的环境条件下还是在不同的环境条件下,POINT都优于传统的方法。此外,KITTI-360数据集的结果表明,与传统方法相比,POINT方法更适用于现实环境。
{"title":"Pose Invariant Topological Memory for Visual Navigation","authors":"Asuto Taniguchi, Fumihiro Sasaki, R. Yamashina","doi":"10.1109/ICCV48922.2021.01510","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01510","url":null,"abstract":"Planning for visual navigation using topological memory, a memory graph consisting of nodes and edges, has been recently well-studied. The nodes correspond to past observations of a robot, and the edges represent the reachability predicted by a neural network (NN). Most prior methods, however, often fail to predict the reachability when the robot takes different poses, i.e. the direction the robot faces, at close positions. This is because the methods observe first-person view images, which significantly changes when the robot changes its pose, and thus it is fundamentally difficult to correctly predict the reachability from them. In this paper, we propose pose invariant topological memory (POINT) to address the problem. POINT observes omnidirectional images and predicts the reachability by using a spherical convolutional NN, which has a rotation invariance property and enables planning regardless of the robot’s pose. Additionally, we train the NN by contrastive learning with data augmentation to enable POINT to plan with robustness to changes in environmental conditions, such as light conditions and the presence of unseen objects. Our experimental results show that POINT outperforms conventional methods under both the same and different environmental conditions. In addition, the results with the KITTI-360 dataset show that POINT is more applicable to real-world environments than conventional methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"109 1","pages":"15364-15373"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88055796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self-supervised 3D Skeleton Action Representation Learning with Motion Consistency and Continuity 具有运动一致性和连续性的自监督三维骨骼动作表示学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01308
Yukun Su, Guosheng Lin, Qingyao Wu
Recently, self-supervised learning (SSL) has been proved very effective and it can help boost the performance in learning representations from unlabeled data in the image domain. Yet, very little is explored about its usefulness in 3D skeleton-based action recognition understanding. Directly applying existing SSL techniques for 3D skeleton learning, however, suffers from trivial solutions and imprecise representations. To tackle these drawbacks, we consider perceiving the consistency and continuity of motion at different playback speeds are two critical issues. To this end, we propose a novel SSL method to learn the 3D skeleton representation in an efficacious way. Specifically, by constructing a positive clip (speed-changed) and a negative clip (motion-broken) of the sampled action sequence, we encourage the positive pairs closer while pushing the negative pairs to force the network to learn the intrinsic dynamic motion consistency information. Moreover, to enhance the learning features, skeleton interpolation is further exploited to model the continuity of human skeleton data. To validate the effectiveness of the proposed method, extensive experiments are conducted on Kinetics, NTU60, NTU120, and PKUMMD datasets with several alternative network architectures. Experimental evaluations demonstrate the superiority of our approach and through which, we can gain significant performance improvement without using extra labeled data.
近年来,自监督学习(SSL)被证明是非常有效的,它可以帮助提高从图像域的未标记数据中学习表示的性能。然而,很少有人探讨它在基于3D骨骼的动作识别理解中的有用性。然而,直接应用现有的SSL技术进行3D骨架学习,会遇到简单的解决方案和不精确的表示。为了解决这些缺点,我们认为在不同播放速度下感知运动的一致性和连续性是两个关键问题。为此,我们提出了一种新颖的SSL方法来有效地学习三维骨架表示。具体而言,我们通过构建采样动作序列的正片段(速度变化)和负片段(运动中断),鼓励正对靠近,同时推动负对,迫使网络学习内在的动态运动一致性信息。此外,为了增强学习特征,进一步利用骨骼插值对人体骨骼数据的连续性进行建模。为了验证所提出方法的有效性,在Kinetics、NTU60、NTU120和PKUMMD数据集上进行了大量的实验,并采用了几种可选的网络架构。实验评估证明了我们方法的优越性,通过这种方法,我们可以在不使用额外标记数据的情况下获得显着的性能改进。
{"title":"Self-supervised 3D Skeleton Action Representation Learning with Motion Consistency and Continuity","authors":"Yukun Su, Guosheng Lin, Qingyao Wu","doi":"10.1109/ICCV48922.2021.01308","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01308","url":null,"abstract":"Recently, self-supervised learning (SSL) has been proved very effective and it can help boost the performance in learning representations from unlabeled data in the image domain. Yet, very little is explored about its usefulness in 3D skeleton-based action recognition understanding. Directly applying existing SSL techniques for 3D skeleton learning, however, suffers from trivial solutions and imprecise representations. To tackle these drawbacks, we consider perceiving the consistency and continuity of motion at different playback speeds are two critical issues. To this end, we propose a novel SSL method to learn the 3D skeleton representation in an efficacious way. Specifically, by constructing a positive clip (speed-changed) and a negative clip (motion-broken) of the sampled action sequence, we encourage the positive pairs closer while pushing the negative pairs to force the network to learn the intrinsic dynamic motion consistency information. Moreover, to enhance the learning features, skeleton interpolation is further exploited to model the continuity of human skeleton data. To validate the effectiveness of the proposed method, extensive experiments are conducted on Kinetics, NTU60, NTU120, and PKUMMD datasets with several alternative network architectures. Experimental evaluations demonstrate the superiority of our approach and through which, we can gain significant performance improvement without using extra labeled data.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"13308-13318"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88140063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning 基于任务自适应损失函数的元学习
Pub Date : 2021-10-01 DOI: 10.1109/iccv48922.2021.00933
Sungyong Baik, Janghoon Choi, Heewon Kim, Dohee Cho, Jaesik Min, Kyoung Mu Lee
In few-shot learning scenarios, the challenge is to generalize and perform well on new unseen examples when only very few labeled examples are available for each task. Model-agnostic meta-learning (MAML) has gained the popularity as one of the representative few-shot learning methods for its flexibility and applicability to diverse problems. However, MAML and its variants often resort to a simple loss function without any auxiliary loss function or regularization terms that can help achieve better generalization. The problem lies in that each application and task may require different auxiliary loss function, especially when tasks are diverse and distinct. Instead of attempting to hand-design an auxiliary loss function for each application and task, we introduce a new meta-learning framework with a loss function that adapts to each task. Our proposed framework, named Meta-Learning with Task-Adaptive Loss Function (MeTAL), demonstrates the effectiveness and the flexibility across various domains, such as few-shot classification and few-shot regression.
在少量学习场景中,挑战在于当每个任务只有很少的标记示例可用时,如何泛化并在新的未见过的示例上表现良好。模型不可知论元学习(Model-agnostic meta-learning, MAML)以其灵活性和对各种问题的适用性,成为具有代表性的小样本学习方法之一。然而,MAML及其变体通常采用简单的损失函数,没有任何辅助损失函数或正则化项,可以帮助实现更好的泛化。问题在于,每个应用程序和任务可能需要不同的辅助损失函数,特别是当任务多样化和不同时。我们没有尝试为每个应用程序和任务手工设计辅助损失函数,而是引入了一个新的元学习框架,该框架具有适应每个任务的损失函数。我们提出的框架,称为具有任务自适应损失函数的元学习(MeTAL),展示了跨多个领域的有效性和灵活性,例如少镜头分类和少镜头回归。
{"title":"Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning","authors":"Sungyong Baik, Janghoon Choi, Heewon Kim, Dohee Cho, Jaesik Min, Kyoung Mu Lee","doi":"10.1109/iccv48922.2021.00933","DOIUrl":"https://doi.org/10.1109/iccv48922.2021.00933","url":null,"abstract":"In few-shot learning scenarios, the challenge is to generalize and perform well on new unseen examples when only very few labeled examples are available for each task. Model-agnostic meta-learning (MAML) has gained the popularity as one of the representative few-shot learning methods for its flexibility and applicability to diverse problems. However, MAML and its variants often resort to a simple loss function without any auxiliary loss function or regularization terms that can help achieve better generalization. The problem lies in that each application and task may require different auxiliary loss function, especially when tasks are diverse and distinct. Instead of attempting to hand-design an auxiliary loss function for each application and task, we introduce a new meta-learning framework with a loss function that adapts to each task. Our proposed framework, named Meta-Learning with Task-Adaptive Loss Function (MeTAL), demonstrates the effectiveness and the flexibility across various domains, such as few-shot classification and few-shot regression.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"51 1","pages":"9445-9454"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86756667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images with Artificial Neural Networks 基于人工神经网络的细胞核显微图像语义感知数据增强
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00392
Alireza Naghizadeh, Hongye Xu, Mohab Mohamed, Dimitris N. Metaxas, Dongfang Liu
There exists many powerful architectures for object detection and semantic segmentation of both biomedical and natural images. However, a difficulty arises in the ability to create training datasets that are large and well-varied. The importance of this subject is nested in the amount of training data that artificial neural networks need to accurately identify and segment objects in images and the infeasibility of acquiring a sufficient dataset within the biomedical field. This paper introduces a new data augmentation method that generates artificial cell nuclei microscopical images along with their correct semantic segmentation labels. Data augmentation provides a step toward accessing higher generalization capabilities of artificial neural networks. An initial set of segmentation objects is used with Greedy AutoAugment to find the strongest performing augmentation policies. The found policies and the initial set of segmentation objects are then used in the creation of the final artificial images. When comparing the state-of-the-art data augmentation methods with the proposed method, the proposed method is shown to consistently outperform current solutions in the generation of nuclei microscopical images.
生物医学和自然图像的目标检测和语义分割已经有了许多强大的体系结构。然而,创建大型且变化良好的训练数据集的能力出现了困难。该主题的重要性嵌套在人工神经网络需要准确识别和分割图像中的物体所需的训练数据量以及在生物医学领域获取足够数据集的不可行性中。本文介绍了一种新的数据增强方法,即生成具有正确语义分割标签的人工细胞核显微图像。数据增强为获得人工神经网络的更高泛化能力提供了一步。一组初始分割对象与Greedy AutoAugment一起使用,以找到性能最强的增强策略。然后将找到的策略和初始分割对象集用于创建最终的人工图像。当比较最先进的数据增强方法与所提出的方法时,所提出的方法在生成核显微图像方面始终优于当前的解决方案。
{"title":"Semantic Aware Data Augmentation for Cell Nuclei Microscopical Images with Artificial Neural Networks","authors":"Alireza Naghizadeh, Hongye Xu, Mohab Mohamed, Dimitris N. Metaxas, Dongfang Liu","doi":"10.1109/ICCV48922.2021.00392","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00392","url":null,"abstract":"There exists many powerful architectures for object detection and semantic segmentation of both biomedical and natural images. However, a difficulty arises in the ability to create training datasets that are large and well-varied. The importance of this subject is nested in the amount of training data that artificial neural networks need to accurately identify and segment objects in images and the infeasibility of acquiring a sufficient dataset within the biomedical field. This paper introduces a new data augmentation method that generates artificial cell nuclei microscopical images along with their correct semantic segmentation labels. Data augmentation provides a step toward accessing higher generalization capabilities of artificial neural networks. An initial set of segmentation objects is used with Greedy AutoAugment to find the strongest performing augmentation policies. The found policies and the initial set of segmentation objects are then used in the creation of the final artificial images. When comparing the state-of-the-art data augmentation methods with the proposed method, the proposed method is shown to consistently outperform current solutions in the generation of nuclei microscopical images.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3932-3941"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87024820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cortical Surface Shape Analysis Based on Alexandrov Polyhedra 基于Alexandrov多面体的皮质表面形状分析
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01398
M. Zhang, Yang Guo, Na Lei, Zhou Zhao, Jianfeng Wu, Xiaoyin Xu, Yalin Wang, X. Gu
Shape analysis has been playing an important role in early diagnosis and prognosis of neurodegenerative diseases such as Alzheimer’s diseases (AD). However, obtaining effective shape representations remains challenging. This paper proposes to use the Alexandrov polyhedra as surface-based shape signatures for cortical morphometry analysis. Given a closed genus-0 surface, its Alexandrov polyhedron is a convex representation that encodes its intrinsic geometry information. We propose to compute the polyhedra via a novel spherical optimal transport (OT) computation. In our experiments, we observe that the Alexandrov polyhedra of cortical surfaces between pathology-confirmed AD and cognitively unimpaired individuals are significantly different. Moreover, we propose a visualization method by comparing local geometry differences across cortical surfaces. We show that the proposed method is effective in pinpointing regional cortical structural changes impacted by AD.
形状分析在阿尔茨海默病(AD)等神经退行性疾病的早期诊断和预后中发挥着重要作用。然而,获得有效的形状表示仍然具有挑战性。本文提出使用亚历山德罗夫多面体作为皮质形态分析的基于表面的形状特征。给定一个封闭的属0曲面,其亚历山德罗夫多面体是一个凸表示,编码其固有的几何信息。我们提出了一种新的球面最优输运(OT)计算方法来计算多面体。在我们的实验中,我们观察到病理证实的AD和认知未受损个体的皮层表面亚历山德罗夫多面体有显著差异。此外,我们提出了一种通过比较皮质表面局部几何差异的可视化方法。我们表明,该方法在精确定位AD影响的区域皮质结构变化方面是有效的。
{"title":"Cortical Surface Shape Analysis Based on Alexandrov Polyhedra","authors":"M. Zhang, Yang Guo, Na Lei, Zhou Zhao, Jianfeng Wu, Xiaoyin Xu, Yalin Wang, X. Gu","doi":"10.1109/ICCV48922.2021.01398","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01398","url":null,"abstract":"Shape analysis has been playing an important role in early diagnosis and prognosis of neurodegenerative diseases such as Alzheimer’s diseases (AD). However, obtaining effective shape representations remains challenging. This paper proposes to use the Alexandrov polyhedra as surface-based shape signatures for cortical morphometry analysis. Given a closed genus-0 surface, its Alexandrov polyhedron is a convex representation that encodes its intrinsic geometry information. We propose to compute the polyhedra via a novel spherical optimal transport (OT) computation. In our experiments, we observe that the Alexandrov polyhedra of cortical surfaces between pathology-confirmed AD and cognitively unimpaired individuals are significantly different. Moreover, we propose a visualization method by comparing local geometry differences across cortical surfaces. We show that the proposed method is effective in pinpointing regional cortical structural changes impacted by AD.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"14224-14232"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87036212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De-rendering Stylized Texts 去渲染风格化文本
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00111
Wataru Shimoda, Daichi Haraguchi, S. Uchida, Kota Yamaguchi
Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, font, style, effects, and hidden background, then utilize those parameters for reconstruction and any editing task. Our text vectorization takes advantage of differentiable text rendering to accurately reproduce the input raster text in a resolution-free parametric format. We show in the experiments that our approach can successfully parse text, styling, and background information in the unified model, and produces artifact-free text editing compared to a raster baseline.
编辑栅格文本是一项有前途但具有挑战性的任务。我们建议将文本矢量化应用于显示媒体(如海报、网页或广告)中的光栅文本编辑任务。在我们的方法中,我们不是在栅格域中应用图像转换或生成,而是学习文本矢量化模型来解析所有渲染参数,包括文本,位置,大小,字体,样式,效果和隐藏背景,然后利用这些参数进行重建和任何编辑任务。我们的文本矢量化利用可微分文本渲染的优势,以无分辨率的参数格式精确地再现输入光栅文本。我们在实验中表明,我们的方法可以成功地解析统一模型中的文本、样式和背景信息,并且与栅格基线相比,产生无人工的文本编辑。
{"title":"De-rendering Stylized Texts","authors":"Wataru Shimoda, Daichi Haraguchi, S. Uchida, Kota Yamaguchi","doi":"10.1109/ICCV48922.2021.00111","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00111","url":null,"abstract":"Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, font, style, effects, and hidden background, then utilize those parameters for reconstruction and any editing task. Our text vectorization takes advantage of differentiable text rendering to accurately reproduce the input raster text in a resolution-free parametric format. We show in the experiments that our approach can successfully parse text, styling, and background information in the unified model, and produces artifact-free text editing compared to a raster baseline.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1056-1065"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87105832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras NeuSpike-Net:通过仿生神经形态相机进行高速视频重建
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00240
Lin Zhu, Jianing Li, Xiao Wang, Tiejun Huang, Yonghong Tian
Neuromorphic vision sensor is a new bio-inspired imaging paradigm that emerged in recent years, which continuously sensing luminance intensity and firing asynchronous spikes (events) with high temporal resolution. Typically, there are two types of neuromorphic vision sensors, namely dynamic vision sensor (DVS) and spike camera. From the perspective of bio-inspired sampling, DVS only perceives movement by imitating the retinal periphery, while the spike camera was developed to perceive fine textures by simulating the fovea. It is meaningful to explore how to combine two types of neuromorphic cameras to reconstruct high quality image like human vision. In this paper, we propose a NeuSpike-Net to learn both the high dynamic range and high motion sensitivity of DVS and the full texture sampling of spike camera to achieve high-speed and high dynamic image reconstruction. We propose a novel representation to effectively extract the temporal information of spike and event data. By introducing the feature fusion module, the two types of neuromorphic data achieve complementary to each other. The experimental results on the simulated and real datasets demonstrate that the proposed approach is effective to reconstruct high-speed and high dynamic range images via the combination of spike and event data.
神经形态视觉传感器是近年来兴起的一种新型生物成像模式,它可以连续感知亮度强度并发射具有高时间分辨率的异步峰值(事件)。通常,有两种类型的神经形态视觉传感器,即动态视觉传感器(DVS)和spike相机。从仿生采样的角度来看,DVS仅通过模仿视网膜外围来感知运动,而spike相机则通过模拟中央凹来感知精细纹理。探索如何将两种类型的神经形态相机结合起来,重建像人类视觉一样的高质量图像,具有重要的现实意义。在本文中,我们提出了一种NeuSpike-Net来学习分布式交换机的高动态范围和高运动灵敏度以及spike相机的全纹理采样,以实现高速和高动态的图像重建。我们提出了一种新的表示方法来有效地提取脉冲和事件数据的时间信息。通过引入特征融合模块,实现两类神经形态数据的互补。在仿真和真实数据集上的实验结果表明,该方法可以有效地将峰值和事件数据结合起来重建高速、高动态范围的图像。
{"title":"NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras","authors":"Lin Zhu, Jianing Li, Xiao Wang, Tiejun Huang, Yonghong Tian","doi":"10.1109/ICCV48922.2021.00240","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00240","url":null,"abstract":"Neuromorphic vision sensor is a new bio-inspired imaging paradigm that emerged in recent years, which continuously sensing luminance intensity and firing asynchronous spikes (events) with high temporal resolution. Typically, there are two types of neuromorphic vision sensors, namely dynamic vision sensor (DVS) and spike camera. From the perspective of bio-inspired sampling, DVS only perceives movement by imitating the retinal periphery, while the spike camera was developed to perceive fine textures by simulating the fovea. It is meaningful to explore how to combine two types of neuromorphic cameras to reconstruct high quality image like human vision. In this paper, we propose a NeuSpike-Net to learn both the high dynamic range and high motion sensitivity of DVS and the full texture sampling of spike camera to achieve high-speed and high dynamic image reconstruction. We propose a novel representation to effectively extract the temporal information of spike and event data. By introducing the feature fusion module, the two types of neuromorphic data achieve complementary to each other. The experimental results on the simulated and real datasets demonstrate that the proposed approach is effective to reconstruct high-speed and high dynamic range images via the combination of spike and event data.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"70 1","pages":"2380-2389"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85637653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1