首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
End-to-End Illuminant Estimation Based on Deep Metric Learning 基于深度度量学习的端到端光源估计
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00367
Bolei Xu, Jingxin Liu, Xianxu Hou, Bozhi Liu, G. Qiu
Previous deep learning approaches to color constancy usually directly estimate illuminant value from input image. Such approaches might suffer heavily from being sensitive to the variation of image content. To overcome this problem, we introduce a deep metric learning approach named Illuminant-Guided Triplet Network (IGTN) to color constancy. IGTN generates an Illuminant Consistent and Discriminative Feature (ICDF) for achieving robust and accurate illuminant color estimation. ICDF is composed of semantic and color features based on a learnable color histogram scheme. In the ICDF space, regardless of the similarities of their contents, images taken under the same or similar illuminants are placed close to each other and at the same time images taken under different illuminants are placed far apart. We also adopt an end-to-end training strategy to simultaneously group image features and estimate illuminant value, and thus our approach does not have to classify illuminant in a separate module. We evaluate our method on two public datasets and demonstrate our method outperforms state-of-the-art approaches. Furthermore, we demonstrate that our method is less sensitive to image appearances, and can achieve more robust and consistent results than other methods on a High Dynamic Range dataset.
以往的深度学习方法通常是直接从输入图像中估计光源值。这种方法可能会因为对图像内容的变化敏感而受到严重影响。为了克服这个问题,我们引入了一种深度度量学习方法,称为光源引导三重网络(IGTN)。IGTN产生一个光源一致和判别特征(ICDF),以实现鲁棒和准确的光源颜色估计。ICDF基于可学习的颜色直方图方案,由语义特征和颜色特征组成。在ICDF空间中,无论其内容是否相似,在相同或相似光源下拍摄的图像都被放置得很近,而在不同光源下拍摄的图像则被放置得很远。我们还采用了端到端的训练策略,可以同时对图像特征进行分组和估计光源值,因此我们的方法不必在单独的模块中对光源进行分类。我们在两个公共数据集上评估我们的方法,并证明我们的方法优于最先进的方法。此外,我们证明了我们的方法对图像外观的敏感性较低,并且可以在高动态范围数据集上获得比其他方法更鲁棒和一致的结果。
{"title":"End-to-End Illuminant Estimation Based on Deep Metric Learning","authors":"Bolei Xu, Jingxin Liu, Xianxu Hou, Bozhi Liu, G. Qiu","doi":"10.1109/CVPR42600.2020.00367","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00367","url":null,"abstract":"Previous deep learning approaches to color constancy usually directly estimate illuminant value from input image. Such approaches might suffer heavily from being sensitive to the variation of image content. To overcome this problem, we introduce a deep metric learning approach named Illuminant-Guided Triplet Network (IGTN) to color constancy. IGTN generates an Illuminant Consistent and Discriminative Feature (ICDF) for achieving robust and accurate illuminant color estimation. ICDF is composed of semantic and color features based on a learnable color histogram scheme. In the ICDF space, regardless of the similarities of their contents, images taken under the same or similar illuminants are placed close to each other and at the same time images taken under different illuminants are placed far apart. We also adopt an end-to-end training strategy to simultaneously group image features and estimate illuminant value, and thus our approach does not have to classify illuminant in a separate module. We evaluate our method on two public datasets and demonstrate our method outperforms state-of-the-art approaches. Furthermore, we demonstrate that our method is less sensitive to image appearances, and can achieve more robust and consistent results than other methods on a High Dynamic Range dataset.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"3613-3622"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88739409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Fast Texture Synthesis via Pseudo Optimizer 通过伪优化器快速纹理合成
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00554
Wu Shi, Y. Qiao
Texture synthesis using deep neural networks can generate high quality and diversified textures. However, it usually requires a heavy optimization process. The following works accelerate the process by using feed-forward networks, but at the cost of scalability. diversity or quality. We propose a new efficient method that aims to simulate the optimization process while retains most of the properties. Our method takes a noise image and the gradients from a descriptor network as inputs, and synthesize a refined image with respect to the target image. The proposed method can synthesize images with better quality and diversity than the other fast synthesis methods do. Moreover, our method trained on a large scale dataset can generalize to synthesize unseen textures.
基于深度神经网络的纹理合成可以生成高质量、多样化的纹理。然而,它通常需要一个繁重的优化过程。下面的工作通过使用前馈网络加速了这一过程,但以可扩展性为代价。多样性或质量我们提出了一种新的高效方法,旨在模拟优化过程,同时保留大部分属性。我们的方法以噪声图像和描述子网络的梯度作为输入,相对于目标图像合成一个精细的图像。与其他快速合成方法相比,该方法合成的图像具有更好的质量和多样性。此外,我们的方法经过大规模数据集的训练,可以泛化到合成看不见的纹理。
{"title":"Fast Texture Synthesis via Pseudo Optimizer","authors":"Wu Shi, Y. Qiao","doi":"10.1109/CVPR42600.2020.00554","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00554","url":null,"abstract":"Texture synthesis using deep neural networks can generate high quality and diversified textures. However, it usually requires a heavy optimization process. The following works accelerate the process by using feed-forward networks, but at the cost of scalability. diversity or quality. We propose a new efficient method that aims to simulate the optimization process while retains most of the properties. Our method takes a noise image and the gradients from a descriptor network as inputs, and synthesize a refined image with respect to the target image. The proposed method can synthesize images with better quality and diversity than the other fast synthesis methods do. Moreover, our method trained on a large scale dataset can generalize to synthesize unseen textures.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"25 1","pages":"5497-5506"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88788862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Information-Driven Direct RGB-D Odometry 信息驱动的直接RGB-D里程计
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00498
Alejandro Fontan, Javier Civera, Rudolph Triebel
This paper presents an information-theoretic approach to point selection in direct RGB-D odometry. The aim is to select only the most informative measurements, in order to reduce the optimization problem with a minimal impact in the accuracy. It is usual practice in visual odometry/SLAM to track several hundreds of points, achieving real-time performance in high-end desktop PCs. Reducing their computational footprint will facilitate the implementation of odometry and SLAM in low-end platforms such as small robots and AR/VR glasses. Our experimental results show that our novel information-based selection criterion allows us to reduce the number of tracked points an order of magnitude (down to only 24 of them), achieving an accuracy similar to the state of the art (sometimes outperforming it) while reducing 10 times the computational demand.
本文提出了一种直接RGB-D测程中点选择的信息论方法。其目的是只选择信息量最大的测量值,以便在对精度影响最小的情况下减少优化问题。在视觉里程计/SLAM中,通常的做法是跟踪数百个点,在高端桌面pc中实现实时性能。减少它们的计算足迹将有助于在小型机器人和AR/VR眼镜等低端平台上实施里程计和SLAM。我们的实验结果表明,我们新颖的基于信息的选择标准使我们能够将跟踪点的数量减少一个数量级(减少到只有24个),在减少10倍的计算需求的同时,实现与当前技术水平相似的精度(有时甚至超过它)。
{"title":"Information-Driven Direct RGB-D Odometry","authors":"Alejandro Fontan, Javier Civera, Rudolph Triebel","doi":"10.1109/cvpr42600.2020.00498","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00498","url":null,"abstract":"This paper presents an information-theoretic approach to point selection in direct RGB-D odometry. The aim is to select only the most informative measurements, in order to reduce the optimization problem with a minimal impact in the accuracy. It is usual practice in visual odometry/SLAM to track several hundreds of points, achieving real-time performance in high-end desktop PCs. Reducing their computational footprint will facilitate the implementation of odometry and SLAM in low-end platforms such as small robots and AR/VR glasses. Our experimental results show that our novel information-based selection criterion allows us to reduce the number of tracked points an order of magnitude (down to only 24 of them), achieving an accuracy similar to the state of the art (sometimes outperforming it) while reducing 10 times the computational demand.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"34 1","pages":"4928-4936"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80634732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Referring Image Segmentation via Cross-Modal Progressive Comprehension 基于跨模态递进理解的参考图像分割
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01050
Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities, but usually fail to explore informative words of the expression to well align features from the two modalities for accurately identifying the referred entity. In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. Concretely, the CMPC module first employs entity and attribute words to perceive all the related entities that might be considered by the expression. Then, the relational words are adopted to highlight the correct entity as well as suppress other irrelevant ones by multimodal graph reasoning. In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information. In this way, features from multi-levels could communicate with each other and be refined based on the textual context. We conduct extensive experiments on four popular referring segmentation benchmarks and achieve new state-of-the-art performances. Code is available at https://github.com/spyflying/CMPC-Refseg.
参考图像分割的目的是分割出能够很好地匹配自然语言表达中给出的描述的实体的前景掩码。先前的方法使用隐式特征交互和视觉和语言模态之间的融合来解决这一问题,但通常无法探索表达的信息词来很好地对齐两种模态的特征以准确识别所指实体。在本文中,我们提出了一个跨模态渐进理解(CMPC)模块和一个文本引导特征交换(TGFE)模块来有效地解决这一具有挑战性的任务。具体来说,CMPC模块首先使用实体词和属性词来感知表达式可能考虑的所有相关实体。然后,通过多模态图推理,利用关联词来突出正确的实体,并抑制其他不相关的实体。除了CMPC模块外,我们还进一步利用简单有效的TGFE模块,在文本信息的指导下,整合不同层次的推理多模态特征。这样,多层次的特征就可以相互交流,并根据文本上下文进行提炼。我们在四种流行的参考分割基准上进行了广泛的实验,并获得了新的最先进的性能。代码可从https://github.com/spyflying/CMPC-Refseg获得。
{"title":"Referring Image Segmentation via Cross-Modal Progressive Comprehension","authors":"Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li","doi":"10.1109/CVPR42600.2020.01050","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01050","url":null,"abstract":"Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities, but usually fail to explore informative words of the expression to well align features from the two modalities for accurately identifying the referred entity. In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. Concretely, the CMPC module first employs entity and attribute words to perceive all the related entities that might be considered by the expression. Then, the relational words are adopted to highlight the correct entity as well as suppress other irrelevant ones by multimodal graph reasoning. In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information. In this way, features from multi-levels could communicate with each other and be refined based on the textual context. We conduct extensive experiments on four popular referring segmentation benchmarks and achieve new state-of-the-art performances. Code is available at https://github.com/spyflying/CMPC-Refseg.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 2 1","pages":"10485-10494"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86958202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Fast MSER
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00344
Hailiang Xu, Siqi Xie, FangFu Chen
Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels with ERs. The data-structure of an ER contains the attributes of a head and a tail linked node, which makes OpenCV MSER hard to be performed in parallel using existing parallel component tree strategies. Besides, pixel extraction (i.e. extracting the pixels in MSERs) in OpenCV MSER is very slow. In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. They first divide an image into several spatial partitions, then construct sub-trees and doubly linked lists (for V1) or a labelled image (for V2) on the partitions in parallel. A novel sub-tree merging algorithm is used in V1 to merge the sub-trees into the final tree, and the doubly linked lists are also merged in the process. While V2 merges the sub-trees using an existing merging algorithm. Finally, MSERs are recognized, the pixels in them are extracted through two novel pixel extraction methods taking advantage of the fact that a lot of pixels in parent and child MSERs are duplicated. Both V1 and V2 outperform three open source MSER algorithms (28 and 26 times faster than OpenCV MSER), and reduce the memory of the pixels in MSERs by 78%.
最大稳定极值区域(MSER)算法基于分量树,用于检测不变量区域。OpenCV MSER是最流行的MSER实现,它使用链表将像素与er关联起来。ER的数据结构包含了头部和尾部链接节点的属性,这使得OpenCV MSER很难使用现有的并行组件树策略并行执行。此外,OpenCV MSER中的像素提取(即在MSER中提取像素)非常慢。在本文中,我们提出了两种新的MSER算法,称为快速MSER V1和V2。他们首先将图像划分为几个空间分区,然后在分区上并行地构建子树和双链表(用于V1)或标记图像(用于V2)。在V1中使用了一种新的子树合并算法,将子树合并到最终树中,并在此过程中合并了双链表。而V2使用现有的合并算法合并子树。最后,利用父、子mser中大量像素重复的特点,采用两种新颖的像素提取方法对mser中的像素进行提取。V1和V2都优于三种开源MSER算法(比OpenCV MSER快28倍和26倍),并且将MSER中的像素内存减少了78%。
{"title":"Fast MSER","authors":"Hailiang Xu, Siqi Xie, FangFu Chen","doi":"10.1109/cvpr42600.2020.00344","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00344","url":null,"abstract":"Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels with ERs. The data-structure of an ER contains the attributes of a head and a tail linked node, which makes OpenCV MSER hard to be performed in parallel using existing parallel component tree strategies. Besides, pixel extraction (i.e. extracting the pixels in MSERs) in OpenCV MSER is very slow. In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. They first divide an image into several spatial partitions, then construct sub-trees and doubly linked lists (for V1) or a labelled image (for V2) on the partitions in parallel. A novel sub-tree merging algorithm is used in V1 to merge the sub-trees into the final tree, and the doubly linked lists are also merged in the process. While V2 merges the sub-trees using an existing merging algorithm. Finally, MSERs are recognized, the pixels in them are extracted through two novel pixel extraction methods taking advantage of the fact that a lot of pixels in parent and child MSERs are duplicated. Both V1 and V2 outperform three open source MSER algorithms (28 and 26 times faster than OpenCV MSER), and reduce the memory of the pixels in MSERs by 78%.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1 1","pages":"3377-3386"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83641777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Image Based Virtual Try-On Network From Unpaired Data 基于映像的未配对数据虚拟试戴网络
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00523
A. Neuberger, Eran Borenstein, Bar Hilleli, Eduard Oks, Sharon Alpert
This paper presents a new image-based virtual try-on approach (Outfit-VITON) that helps visualize how a composition of clothing items selected from various reference images form a cohesive outfit on a person in a query image. Our algorithm has two distinctive properties. First, it is inexpensive, as it simply requires a large set of single (non-corresponding) images (both real and catalog) of people wearing various garments without explicit 3D information. The training phase requires only single images, eliminating the need for manually creating image pairs, where one image shows a person wearing a particular garment and the other shows the same catalog garment alone. Secondly, it can synthesize images of multiple garments composed into a single, coherent outfit; and it enables control of the type of garments rendered in the final outfit. Once trained, our approach can then synthesize a cohesive outfit from multiple images of clothed human models, while fitting the outfit to the body shape and pose of the query person. An online optimization step takes care of fine details such as intricate textures and logos. Quantitative and qualitative evaluations on an image dataset containing large shape and style variations demonstrate superior accuracy compared to existing state-of-the-art methods, especially when dealing with highly detailed garments.
本文提出了一种新的基于图像的虚拟试穿方法(outfit - viton),它有助于可视化从各种参考图像中选择的服装项目的组合如何在查询图像中形成一个人的内聚服装。我们的算法有两个不同的特性。首先,它是廉价的,因为它只需要大量的单一(不对应的)图像(真实的和目录的),人们穿着不同的服装,没有明确的3D信息。训练阶段只需要单个图像,消除了手动创建图像对的需要,其中一个图像显示穿着特定服装的人,另一个图像单独显示相同的目录服装。其次,它可以将多套服装的图像合成成一套连贯的服装;它还可以控制在最终服装中呈现的服装类型。经过训练后,我们的方法可以从多张穿着衣服的人体模型图像中合成一套有凝聚力的服装,同时将服装与查询人的体型和姿势相匹配。在线优化步骤需要处理精细的细节,如复杂的纹理和徽标。对包含大量形状和风格变化的图像数据集进行定量和定性评估,与现有的最先进的方法相比,显示出更高的准确性,特别是在处理高度详细的服装时。
{"title":"Image Based Virtual Try-On Network From Unpaired Data","authors":"A. Neuberger, Eran Borenstein, Bar Hilleli, Eduard Oks, Sharon Alpert","doi":"10.1109/cvpr42600.2020.00523","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00523","url":null,"abstract":"This paper presents a new image-based virtual try-on approach (Outfit-VITON) that helps visualize how a composition of clothing items selected from various reference images form a cohesive outfit on a person in a query image. Our algorithm has two distinctive properties. First, it is inexpensive, as it simply requires a large set of single (non-corresponding) images (both real and catalog) of people wearing various garments without explicit 3D information. The training phase requires only single images, eliminating the need for manually creating image pairs, where one image shows a person wearing a particular garment and the other shows the same catalog garment alone. Secondly, it can synthesize images of multiple garments composed into a single, coherent outfit; and it enables control of the type of garments rendered in the final outfit. Once trained, our approach can then synthesize a cohesive outfit from multiple images of clothed human models, while fitting the outfit to the body shape and pose of the query person. An online optimization step takes care of fine details such as intricate textures and logos. Quantitative and qualitative evaluations on an image dataset containing large shape and style variations demonstrate superior accuracy compared to existing state-of-the-art methods, especially when dealing with highly detailed garments.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"5183-5192"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89974278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers 使用纵向富集表征成像生物标志物预测认知衰退
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00488
Lyujian Lu, Hua Wang, Saad Elbeleidy, F. Nie
With rapid progress in high-throughput genotyping and neuroimaging, researches of complex brain disorders, such as Alzheimer’s Disease (AD), have gained significant attention in recent years. Many prediction models have been studied to relate neuroimaging measures to cognitive status over the progressions when these disease develops. Missing data is one of the biggest challenge in accurate cognitive score prediction of subjects in longitudinal neuroimaging studies. To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation for imaging biomarkers that can simultaneously capture both the information conveyed by baseline neuroimaging records and that by progressive variations of varied counts of available follow-up records over time. While the numbers of the brain scans of the participants vary, the learned biomarker representation for every participant is a fixed-length vector, which enable us to use traditional learning models to study AD developments. Our new objective is formulated to maximize the ratio of the summations of a number of L1-norm distances for improved robustness, which, though, is difficult to efficiently solve in general. Thus we derive a new efficient iterative solution algorithm and rigorously prove its convergence. We have performed extensive experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. A performance gain has been achieved to predict four different cognitive scores, when we compare the original baseline representations against the learned representations with enrichments. These promising empirical results have demonstrated improved performances of our new method that validate its effectiveness.
近年来,随着高通量基因分型和神经影像学的快速发展,阿尔茨海默病(AD)等复杂脑部疾病的研究得到了广泛关注。已经研究了许多预测模型,将神经影像学措施与这些疾病发展时的认知状态联系起来。在纵向神经影像学研究中,数据缺失是准确预测受试者认知评分的最大挑战之一。为了解决这个问题,在本文中,我们提出了一种新的公式来学习成像生物标志物的丰富表示,该表示可以同时捕获基线神经成像记录所传达的信息,以及随着时间推移可用的随访记录的不同计数的渐进变化。虽然参与者的脑部扫描次数各不相同,但每个参与者的学习生物标志物表示是一个固定长度的向量,这使我们能够使用传统的学习模型来研究AD的发展。我们的新目标是为了提高鲁棒性而最大化若干l1范数距离的和的比率,尽管这在一般情况下很难有效地解决。由此导出了一种新的高效迭代求解算法,并严格证明了其收敛性。我们在阿尔茨海默病神经成像倡议(ADNI)数据集上进行了广泛的实验。当我们将原始基线表示与丰富的学习表示进行比较时,在预测四种不同的认知分数方面取得了性能增益。这些有希望的实证结果表明,我们的新方法的性能有所提高,验证了其有效性。
{"title":"Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers","authors":"Lyujian Lu, Hua Wang, Saad Elbeleidy, F. Nie","doi":"10.1109/cvpr42600.2020.00488","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00488","url":null,"abstract":"With rapid progress in high-throughput genotyping and neuroimaging, researches of complex brain disorders, such as Alzheimer’s Disease (AD), have gained significant attention in recent years. Many prediction models have been studied to relate neuroimaging measures to cognitive status over the progressions when these disease develops. Missing data is one of the biggest challenge in accurate cognitive score prediction of subjects in longitudinal neuroimaging studies. To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation for imaging biomarkers that can simultaneously capture both the information conveyed by baseline neuroimaging records and that by progressive variations of varied counts of available follow-up records over time. While the numbers of the brain scans of the participants vary, the learned biomarker representation for every participant is a fixed-length vector, which enable us to use traditional learning models to study AD developments. Our new objective is formulated to maximize the ratio of the summations of a number of L1-norm distances for improved robustness, which, though, is difficult to efficiently solve in general. Thus we derive a new efficient iterative solution algorithm and rigorously prove its convergence. We have performed extensive experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. A performance gain has been achieved to predict four different cognitive scores, when we compare the original baseline representations against the learned representations with enrichments. These promising empirical results have demonstrated improved performances of our new method that validate its effectiveness.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"4826-4835"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90801949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models GHUM & GHUML:生成3D人体形状和关节姿势模型
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00622
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, W. Freeman, R. Sukthankar, C. Sminchisescu
We present a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Given high-resolution complete 3D body scans of humans, captured in various poses, together with additional closeups of their head and facial expressions, as well as hand articulation, and given initial, artist designed, gender neutral rigged quad-meshes, we train all model parameters including non-linear shape spaces based on variational auto-encoders, pose-space deformation correctives, skeleton joint center predictors, and blend skinning functions, in a single consistent learning loop. The models are simultaneously trained with all the 3d dynamic scan data (over 60,000 diverse human configurations in our new dataset) in order to capture correlations and ensure consistency of various components. Models support facial expression analysis, as well as body (with detailed hand) shape and pose estimation. We provide fully train-able generic human models of different resolutions- the moderate-resolution GHUM consisting of 10,168 vertices and the low-resolution GHUML(ite) of 3,194 vertices–, run comparisons between them, analyze the impact of different components and illustrate their reconstruction from image data. The models will be available for research.
我们提出了一个统计的,铰接的3D人体形状建模管道,在一个完全可训练的,模块化的,深度学习框架。给定高分辨率完整的人体3D扫描,以各种姿势捕获,加上头部和面部表情的额外特写,以及手部关节,并给定初始的,艺术家设计的,性别中立的操纵四网格,我们训练所有模型参数,包括基于变分自编码器的非线性形状空间,姿态空间变形校正,骨骼关节中心预测器和混合皮肤功能,在一个一致的学习循环中。这些模型同时使用所有3d动态扫描数据(在我们的新数据集中超过60,000种不同的人类配置)进行训练,以捕获相关性并确保各个组件的一致性。模型支持面部表情分析,以及身体(带有详细的手)形状和姿势估计。我们提供了不同分辨率的完全可训练的通用人体模型-由10,168个顶点组成的中等分辨率GHUM和由3,194个顶点组成的低分辨率GHUML(ite) -在它们之间进行比较,分析不同组件的影响,并说明它们从图像数据中重建。这些模型将用于研究。
{"title":"GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models","authors":"Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, W. Freeman, R. Sukthankar, C. Sminchisescu","doi":"10.1109/cvpr42600.2020.00622","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00622","url":null,"abstract":"We present a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Given high-resolution complete 3D body scans of humans, captured in various poses, together with additional closeups of their head and facial expressions, as well as hand articulation, and given initial, artist designed, gender neutral rigged quad-meshes, we train all model parameters including non-linear shape spaces based on variational auto-encoders, pose-space deformation correctives, skeleton joint center predictors, and blend skinning functions, in a single consistent learning loop. The models are simultaneously trained with all the 3d dynamic scan data (over 60,000 diverse human configurations in our new dataset) in order to capture correlations and ensure consistency of various components. Models support facial expression analysis, as well as body (with detailed hand) shape and pose estimation. We provide fully train-able generic human models of different resolutions- the moderate-resolution GHUM consisting of 10,168 vertices and the low-resolution GHUML(ite) of 3,194 vertices–, run comparisons between them, analyze the impact of different components and illustrate their reconstruction from image data. The models will be available for research.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"88 1","pages":"6183-6192"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89766141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 217
Three-Dimensional Reconstruction of Human Interactions 人类互动的三维重建
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00724
Mihai Fieraru, M. Zanfir, Elisabeta Oneata, A. Popa, Vlad Olaru, C. Sminchisescu
Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.
理解3d人类互动是细粒度场景分析和行为建模的基础。然而,大多数现有模型侧重于孤立地分析单个人,而那些处理几个人的模型主要侧重于解决多人数据关联,而不是推断交互。这可能会导致错误的、毫无生气的3d估计,忽略了微妙的人类接触方面(事件的本质),并且对详细的行为理解几乎没有用处。本文解决了这些问题,并做出了一些贡献:(1)我们引入了交互签名估计(ISP)模型,包括接触检测、分割和三维接触签名预测;(2)我们展示了如何利用这些组件来产生增强的损失,以确保3d重建期间的接触一致性;(3)构建了多个大型数据集,用于学习和评估三维接触预测和重建方法;具体来说,我们介绍了CHI3D,一个基于实验室的精确3d动作捕捉数据集,其中包含631个序列,包含2,525个接触事件,728,664个地面真实3d姿势,以及FlickrCI3D,一个包含11,216张图像的数据集,其中包含14,081对处理过的人,以及在138,213个选定的接触区域内的81,233个面级表面对应。最后,(4)我们提出了模型和基线来说明接触估计如何支持有意义的三维重建,其中捕获了基本的相互作用。用于研究目的的模型和数据可在http://vision.imar.ro/ci3d上获得。
{"title":"Three-Dimensional Reconstruction of Human Interactions","authors":"Mihai Fieraru, M. Zanfir, Elisabeta Oneata, A. Popa, Vlad Olaru, C. Sminchisescu","doi":"10.1109/cvpr42600.2020.00724","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00724","url":null,"abstract":"Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"50 1","pages":"7212-7221"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90423395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Towards Transferable Targeted Attack 转向可转移目标攻击
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00072
Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, Heng Huang
An intriguing property of adversarial examples is their transferability, which suggests that black-box attacks are feasible in real-world applications. Previous works mostly study the transferability on non-targeted setting. However, recent studies show that targeted adversarial examples are more difficult to transfer than non-targeted ones. In this paper, we find there exist two defects that lead to the difficulty in generating transferable examples. First, the magnitude of gradient is decreasing during iterative attack, causing excessive consistency between two successive noises in accumulation of momentum, which is termed as noise curing. Second, it is not enough for targeted adversarial examples to just get close to target class without moving away from true class. To overcome the above problems, we propose a novel targeted attack approach to effectively generate more transferable adversarial examples. Specifically, we first introduce the Poincar'{e} distance as the similarity metric to make the magnitude of gradient self-adaptive during iterative attack to alleviate noise curing. Furthermore, we regularize the targeted attack process with metric learning to take adversarial examples away from true label and gain more transferable targeted adversarial examples. Experiments on ImageNet validate the superiority of our approach achieving 8% higher attack success rate over other state-of-the-art methods on average in black-box targeted attack.
对抗性示例的一个有趣的特性是它们的可转移性,这表明黑盒攻击在实际应用中是可行的。以往的研究大多是研究非目标设置下的可转移性。然而,最近的研究表明,目标对抗示例比非目标对抗示例更难转移。在本文中,我们发现存在两个缺陷,导致难以产生可转移的例子。首先,梯度的大小在迭代攻击过程中不断减小,导致两个连续噪声在动量积累过程中过于一致,称为噪声固化。其次,目标对抗性示例仅仅接近目标类别而不偏离真实类别是不够的。为了克服上述问题,我们提出了一种新的目标攻击方法来有效地生成更多可转移的对抗示例。具体而言,我们首先引入庞加莱距离作为相似度度量,使梯度的大小在迭代攻击过程中自适应,以减轻噪声的影响。此外,我们使用度量学习来正则化目标攻击过程,以使对抗示例远离真实标签,并获得更多可转移的目标对抗示例。在ImageNet上的实验验证了我们的方法的优越性,在黑盒目标攻击中,平均攻击成功率比其他最先进的方法高8%。
{"title":"Towards Transferable Targeted Attack","authors":"Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, Heng Huang","doi":"10.1109/cvpr42600.2020.00072","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00072","url":null,"abstract":"An intriguing property of adversarial examples is their transferability, which suggests that black-box attacks are feasible in real-world applications. Previous works mostly study the transferability on non-targeted setting. However, recent studies show that targeted adversarial examples are more difficult to transfer than non-targeted ones. In this paper, we find there exist two defects that lead to the difficulty in generating transferable examples. First, the magnitude of gradient is decreasing during iterative attack, causing excessive consistency between two successive noises in accumulation of momentum, which is termed as noise curing. Second, it is not enough for targeted adversarial examples to just get close to target class without moving away from true class. To overcome the above problems, we propose a novel targeted attack approach to effectively generate more transferable adversarial examples. Specifically, we first introduce the Poincar'{e} distance as the similarity metric to make the magnitude of gradient self-adaptive during iterative attack to alleviate noise curing. Furthermore, we regularize the targeted attack process with metric learning to take adversarial examples away from true label and gain more transferable targeted adversarial examples. Experiments on ImageNet validate the superiority of our approach achieving 8% higher attack success rate over other state-of-the-art methods on average in black-box targeted attack.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"638-646"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73506622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1