首页 > 最新文献

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文 中文
SimpleRecon: 3D Reconstruction Without 3D Convolutions SimpleRecon: 3D重建没有3D卷积
Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard
Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon
传统的三维室内场景重建方法分为两个阶段:图像深度估计、深度合并和表面重建。最近,出现了一系列直接在最终三维体积特征空间中进行重建的方法。虽然这些方法显示了令人印象深刻的重建结果,但它们依赖于昂贵的3D卷积层,限制了它们在资源受限环境中的应用。在这项工作中,我们回到了传统的路线,并展示了如何专注于高质量的多视图深度预测,从而使用简单的现成深度融合实现高精度的3D重建。我们提出了一个简单的最先进的多视图深度估计器,它有两个主要贡献:1)一个精心设计的2D CNN,它利用强大的图像先验以及平面扫描特征体积和几何损失,结合2)将关键帧和几何元数据集成到成本体积中,从而允许知情深度平面评分。我们的方法在深度估计和ScanNet和7-Scenes上接近或更好的3D重建方面取得了显著的领先优势,但仍然允许在线实时低内存重建。代码、模型和结果可在https://nianticlabs.github.io/simplerecon上获得
{"title":"SimpleRecon: 3D Reconstruction Without 3D Convolutions","authors":"Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard","doi":"10.48550/arXiv.2208.14743","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14743","url":null,"abstract":"Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79386930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Style-Agnostic Reinforcement Learning 风格不可知的强化学习
Juyong Lee, Seokjun Ahn, Jaesik Park
We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.
我们提出了一种在强化学习框架中使用风格迁移和对抗学习来学习风格不可知表征的新方法。这里的风格指的是与任务无关的细节,比如图像中背景的颜色,在不同风格的环境中泛化学习到的策略仍然是一个挑战。专注于学习风格不可知表示,我们的方法用一个固有的对抗性风格扰动生成器生成的不同图像风格来训练actor,该生成器在actor和生成器之间进行最小-最大博弈,而不需要专家知识来增强数据或额外的类标签来进行对抗性训练。我们验证了我们的方法在Procgen和distraction Control Suite基准测试上取得了与最先进的方法相比具有竞争力或更好的性能,并进一步研究了从我们的模型中提取的特征,表明该模型更好地捕获了不变量,并且较少受到转移风格的干扰。代码可在https://github.com/POSTECH-CVLab/style-agnostic-RL上获得。
{"title":"Style-Agnostic Reinforcement Learning","authors":"Juyong Lee, Seokjun Ahn, Jaesik Park","doi":"10.48550/arXiv.2208.14863","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14863","url":null,"abstract":"We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90567685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer ASpanFormer:无检测器图像匹配与自适应跨度变压器
Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, D. McKinnon, Yanghai Tsin, Long Quan
Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.
跨图像生成健壮可靠的通信是各种应用程序的基本任务。为了在全局和局部粒度上捕获上下文,我们提出了ASpanFormer,这是一种基于transformer的无检测器匹配器,它建立在分层注意力结构上,采用了一种新颖的注意力操作,能够以自适应的方式调整注意力广度。为了实现这一目标,首先在每个交叉注意阶段对流图进行回归,定位搜索区域的中心。接下来,在中心周围生成采样网格,其大小不是经验配置为固定的,而是根据与流程图一起估计的像素不确定性自适应计算。最后,在派生区域内计算两幅图像的注意力,称为注意力广度。通过这些方法,我们不仅能够保持远程依赖关系,而且还能够在高相关性的像素之间实现细粒度关注,从而补偿匹配任务中必要的局部性和分段平滑性。在广泛的评估基准上的最先进的准确性验证了我们方法的强大匹配能力。
{"title":"ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer","authors":"Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, D. McKinnon, Yanghai Tsin, Long Quan","doi":"10.48550/arXiv.2208.14201","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14201","url":null,"abstract":"Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81919749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Open-Set Semi-Supervised Object Detection 开集半监督目标检测
Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Péter Vajda, Zijian He, Z. Kira
Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.
半监督对象检测(SSOD)的最新发展显示了利用未标记数据来改进对象检测器的前景。然而,到目前为止,这些方法都假设未标记的数据不包含分布外(OOD)类,这对于更大规模的未标记数据集是不现实的。在本文中,我们考虑了一个更实际但更具挑战性的问题,开集半监督目标检测(OSSOD)。我们首先发现现有的SSOD方法在开放集条件下获得较低的性能增益,这是由于语义扩展导致的,其中分散的OOD对象被错误地预测为半监督训练的分布内伪标签。为了解决这个问题,我们考虑了与SSOD方法集成的在线和离线OOD检测模块。通过广泛的研究,我们发现利用基于自监督视觉变压器的离线OOD检测器由于其对伪标记干扰的鲁棒性而优于在线OOD检测器。在实验中,我们提出的框架有效地解决了语义扩展问题,并在许多OSSOD基准上显示出一致的改进,包括大规模的COCO-OpenImages。我们还验证了我们的框架在不同OSSOD条件下的有效性,包括不同数量的分布内类、不同程度的监督和不同的未标记集组合。
{"title":"Open-Set Semi-Supervised Object Detection","authors":"Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Péter Vajda, Zijian He, Z. Kira","doi":"10.48550/arXiv.2208.13722","DOIUrl":"https://doi.org/10.48550/arXiv.2208.13722","url":null,"abstract":"Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80715142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Anti-Retroactive Interference for Lifelong Learning 终身学习的抗追溯干扰
Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo
. Humans can continuously learn new knowledge. However, machine learning models suffer from drastic dropping in performance on previous tasks after learning new tasks. Cognitive science points out that the competition of similar knowledge is an important cause of forgetting. In this paper, we design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. First, we disrupt the sample’s background distribution through a background attack, which strengthens the model to extract the key features of each task. Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum. The proposed method is validated on the MNIST, CIFAR100, CUB200 and ImageNet100 datasets. The code is available at https://github.com/bhrqw/ARI .
. 人类可以不断地学习新知识。然而,机器学习模型在学习新任务后,在先前任务上的性能会急剧下降。认知科学指出,相似知识的竞争是遗忘的重要原因。本文设计了一个基于元学习和大脑联想机制的终身学习范式。它从知识提取和知识记忆两个方面来解决这个问题。首先,我们通过背景攻击破坏样本的背景分布,增强模型以提取每个任务的关键特征。其次,根据增量知识与基础知识的相似性,设计增量知识的自适应融合,帮助模型对不同难度的知识进行容量分配;从理论上分析了所提出的学习范式能使不同任务的模型收敛到同一最优。在MNIST、CIFAR100、CUB200和ImageNet100数据集上对该方法进行了验证。代码可在https://github.com/bhrqw/ARI上获得。
{"title":"Anti-Retroactive Interference for Lifelong Learning","authors":"Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo","doi":"10.48550/arXiv.2208.12967","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12967","url":null,"abstract":". Humans can continuously learn new knowledge. However, machine learning models suffer from drastic dropping in performance on previous tasks after learning new tasks. Cognitive science points out that the competition of similar knowledge is an important cause of forgetting. In this paper, we design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. First, we disrupt the sample’s background distribution through a background attack, which strengthens the model to extract the key features of each task. Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum. The proposed method is validated on the MNIST, CIFAR100, CUB200 and ImageNet100 datasets. The code is available at https://github.com/bhrqw/ARI .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82486029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation CMD:基于跨模态相互蒸馏的自监督3D动作表示学习
Yunyao Mao, Wen-gang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li
In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD
在三维动作识别中,骨骼形态之间存在着丰富的互补信息。然而,如何建模和利用这些信息仍然是自监督三维动作表示学习的一个具有挑战性的问题。在这项工作中,我们将跨模态交互描述为一个双向知识蒸馏问题。与经典的蒸馏解决方案不同,将固定和预先训练的教师的知识转移给学生,在这项工作中,知识不断更新,并在模式之间双向蒸馏。为此,我们提出了一个新的跨模态相互蒸馏(CMD)框架,其设计如下:一方面,引入相邻相似度分布对各模态学习到的知识进行建模,其中的关系信息自然适合于对比框架;另一方面,教师和学生使用不对称配置来稳定蒸馏过程,并在模态之间传递高置信度信息。通过推导,我们发现以前工作中的跨模态正挖掘可以看作是我们的CMD的退化版本。我们在NTU RGB+ d60、NTU RGB+ d120和PKU-MMD II数据集上进行了广泛的实验。我们的方法优于现有的自我监督方法,并创造了一系列新的记录。代码可从https://github.com/maoyunyao/CMD获得
{"title":"CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation","authors":"Yunyao Mao, Wen-gang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li","doi":"10.48550/arXiv.2208.12448","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12448","url":null,"abstract":"In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74250847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Discovering Transferable Forensic Features for CNN-generated Images Detection 发现可转移的取证特征为cnn生成的图像检测
Keshigeyan Chandrasegaran, Ngoc-Trung Tran, A. Binder, Ngai-Man Cheung
Visual counterfeits are increasingly causing an existential conundrum in mainstream media with rapid evolution in neural image synthesis methods. Though detection of such counterfeits has been a taxing problem in the image forensics community, a recent class of forensic detectors -- universal detectors -- are able to surprisingly spot counterfeit images regardless of generator architectures, loss functions, training datasets, and resolutions. This intriguing property suggests the possible existence of transferable forensic features (T-FF) in universal detectors. In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. Our contributions are 2-fold: 1) We propose a novel forensic feature relevance statistic (FF-RS) to quantify and discover T-FF in universal detectors and, 2) Our qualitative and quantitative investigations uncover an unexpected finding: color is a critical T-FF in universal detectors. Code and models are available at https://keshik6.github.io/transferable-forensic-features/
随着神经图像合成方法的快速发展,视觉伪造越来越成为主流媒体存在的难题。尽管在图像取证社区中,检测此类伪造图像一直是一个棘手的问题,但最近一类法医探测器——通用探测器——能够令人惊讶地发现伪造图像,而不考虑生成器架构、损失函数、训练数据集和分辨率。这一有趣的特性表明,在宇宙探测器中可能存在可转移的法医特征(T-FF)。在这项工作中,我们进行了第一次分析研究,以发现和理解宇宙探测器中的T-FF。我们的贡献有两个方面:1)我们提出了一种新的法医特征相关统计(FF-RS)来量化和发现通用探测器中的T-FF; 2)我们的定性和定量研究揭示了一个意想不到的发现:颜色是通用探测器中关键的T-FF。代码和模型可在https://keshik6.github.io/transferable-forensic-features/上获得
{"title":"Discovering Transferable Forensic Features for CNN-generated Images Detection","authors":"Keshigeyan Chandrasegaran, Ngoc-Trung Tran, A. Binder, Ngai-Man Cheung","doi":"10.48550/arXiv.2208.11342","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11342","url":null,"abstract":"Visual counterfeits are increasingly causing an existential conundrum in mainstream media with rapid evolution in neural image synthesis methods. Though detection of such counterfeits has been a taxing problem in the image forensics community, a recent class of forensic detectors -- universal detectors -- are able to surprisingly spot counterfeit images regardless of generator architectures, loss functions, training datasets, and resolutions. This intriguing property suggests the possible existence of transferable forensic features (T-FF) in universal detectors. In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. Our contributions are 2-fold: 1) We propose a novel forensic feature relevance statistic (FF-RS) to quantify and discover T-FF in universal detectors and, 2) Our qualitative and quantitative investigations uncover an unexpected finding: color is a critical T-FF in universal detectors. Code and models are available at https://keshik6.github.io/transferable-forensic-features/","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88266800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization 基于核实例归一化的超高分辨率非配对染色变换
M. Ho, Min Wu, Che-Ming Wu
While hematoxylin and eosin (H&E) is a standard staining procedure, immunohistochemistry (IHC) staining further serves as a diagnostic and prognostic method. However, acquiring special staining results requires substantial costs. Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage. Given a patch, corresponding position, and a kernel, KIN computes local statistics using convolution operation. In addition, KIN can be easily plugged into most currently developed frameworks without re-training. We demonstrate that KIN achieves state-of-the-art stain transformation by replacing instance normalization (IN) layers with KIN layers in three popular frameworks and testing on two histopathological datasets. Furthermore, we manifest the generalizability of KIN with high-resolution natural images. Finally, human evaluation and several objective metrics are used to compare the performance of different approaches. Overall, this is the first successful study for the ultra-high-resolution unpaired image-to-image translation with constant space complexity. Code is available at: https://github.com/Kaminyou/URUST
虽然苏木精和伊红(H&E)是标准的染色方法,但免疫组织化学(IHC)染色还可以作为诊断和预后的方法。然而,获得特殊的染色效果需要大量的成本。因此,我们提出了一种超高分辨率非配对图像到图像转换的策略:kernel - ized Instance Normalization (KIN),该策略保留了局部信息,并成功地在恒定的GPU内存使用下实现了无缝的染色转换。给定一个补丁、对应的位置和一个内核,KIN使用卷积运算计算局部统计信息。此外,KIN可以很容易地插入到大多数当前开发的框架中,而无需重新培训。我们证明,通过在三个流行的框架中用KIN层替换实例规范化(IN)层,并在两个组织病理学数据集上进行测试,KIN实现了最先进的染色转换。此外,我们用高分辨率自然图像证明了KIN的泛化性。最后,利用人的评价和几个客观指标来比较不同方法的性能。总的来说,这是首次成功研究具有恒定空间复杂度的超高分辨率非配对图像到图像的翻译。代码可从https://github.com/Kaminyou/URUST获得
{"title":"Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization","authors":"M. Ho, Min Wu, Che-Ming Wu","doi":"10.48550/arXiv.2208.10730","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10730","url":null,"abstract":"While hematoxylin and eosin (H&E) is a standard staining procedure, immunohistochemistry (IHC) staining further serves as a diagnostic and prognostic method. However, acquiring special staining results requires substantial costs. Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage. Given a patch, corresponding position, and a kernel, KIN computes local statistics using convolution operation. In addition, KIN can be easily plugged into most currently developed frameworks without re-training. We demonstrate that KIN achieves state-of-the-art stain transformation by replacing instance normalization (IN) layers with KIN layers in three popular frameworks and testing on two histopathological datasets. Furthermore, we manifest the generalizability of KIN with high-resolution natural images. Finally, human evaluation and several objective metrics are used to compare the performance of different approaches. Overall, this is the first successful study for the ultra-high-resolution unpaired image-to-image translation with constant space complexity. Code is available at: https://github.com/Kaminyou/URUST","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75992416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning Visibility for Robust Dense Human Body Estimation 鲁棒密集人体估计的学习可见性
Chunfeng Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang
Estimating 3D human pose and shape from 2D images is a crucial yet challenging task. While prior methods with model-based representations can perform reasonably well on whole-body images, they often fail when parts of the body are occluded or outside the frame. Moreover, these results usually do not faithfully capture the human silhouettes due to their limited representation power of deformable models (e.g., representing only the naked body). An alternative approach is to estimate dense vertices of a predefined template body in the image space. Such representations are effective in localizing vertices within an image but cannot handle out-of-frame body parts. In this work, we learn dense human body estimation that is robust to partial observations. We explicitly model the visibility of human joints and vertices in the x, y, and z axes separately. The visibility in x and y axes help distinguishing out-of-frame cases, and the visibility in depth axis corresponds to occlusions (either self-occlusions or occlusions by other objects). We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates. We show that visibility can serve as 1) an additional signal to resolve depth ordering ambiguities of self-occluded vertices and 2) a regularization term when fitting a human body model to the predictions. Extensive experiments on multiple 3D human datasets demonstrate that visibility modeling significantly improves the accuracy of human body estimation, especially for partial-body cases. Our project page with code is at: https://github.com/chhankyao/visdb.
从2D图像中估计3D人体姿势和形状是一项至关重要但具有挑战性的任务。虽然先前的基于模型表示的方法可以在全身图像上表现得相当好,但当身体的某些部分被遮挡或在帧外时,它们往往会失败。此外,由于可变形模型的表现能力有限(例如,仅代表裸体),这些结果通常不能忠实地捕捉人体轮廓。另一种方法是估计图像空间中预定义模板体的密集顶点。这种表示在定位图像中的顶点时是有效的,但不能处理帧外的身体部分。在这项工作中,我们学习了对部分观测稳健的密集人体估计。我们分别在x、y和z轴上对人体关节和顶点的可见性进行了显式建模。x轴和y轴的可见性有助于区分帧外情况,深度轴的可见性对应于遮挡(自身遮挡或其他物体遮挡)。我们从密集的UV对应中获得可见性标签的伪真地,并训练神经网络与3D坐标一起预测可见性。我们表明,可见性可以作为1)一个额外的信号来解决自遮挡顶点的深度排序歧义,以及2)在将人体模型拟合到预测时的正则化项。在多个三维人体数据集上进行的大量实验表明,能见度建模显著提高了人体估计的准确性,特别是在部分人体情况下。我们的项目代码页面在:https://github.com/chhankyao/visdb。
{"title":"Learning Visibility for Robust Dense Human Body Estimation","authors":"Chunfeng Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang","doi":"10.48550/arXiv.2208.10652","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10652","url":null,"abstract":"Estimating 3D human pose and shape from 2D images is a crucial yet challenging task. While prior methods with model-based representations can perform reasonably well on whole-body images, they often fail when parts of the body are occluded or outside the frame. Moreover, these results usually do not faithfully capture the human silhouettes due to their limited representation power of deformable models (e.g., representing only the naked body). An alternative approach is to estimate dense vertices of a predefined template body in the image space. Such representations are effective in localizing vertices within an image but cannot handle out-of-frame body parts. In this work, we learn dense human body estimation that is robust to partial observations. We explicitly model the visibility of human joints and vertices in the x, y, and z axes separately. The visibility in x and y axes help distinguishing out-of-frame cases, and the visibility in depth axis corresponds to occlusions (either self-occlusions or occlusions by other objects). We obtain pseudo ground-truths of visibility labels from dense UV correspondences and train a neural network to predict visibility along with 3D coordinates. We show that visibility can serve as 1) an additional signal to resolve depth ordering ambiguities of self-occluded vertices and 2) a regularization term when fitting a human body model to the predictions. Extensive experiments on multiple 3D human datasets demonstrate that visibility modeling significantly improves the accuracy of human body estimation, especially for partial-body cases. Our project page with code is at: https://github.com/chhankyao/visdb.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78907177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adversarial Feature Augmentation for Cross-domain Few-shot Classification 面向跨域小样本分类的对抗特征增强
Yan Hu, A. J. Ma
Existing methods based on meta-learning predict novel-class labels for (target domain) testing tasks via meta knowledge learned from (source domain) training tasks of base classes. However, most existing works may fail to generalize to novel classes due to the probably large domain discrepancy across domains. To address this issue, we propose a novel adversarial feature augmentation (AFA) method to bridge the domain gap in few-shot learning. The feature augmentation is designed to simulate distribution variations by maximizing the domain discrepancy. During adversarial training, the domain discriminator is learned by distinguishing the augmented features (unseen domain) from the original ones (seen domain), while the domain discrepancy is minimized to obtain the optimal feature encoder. The proposed method is a plug-and-play module that can be easily integrated into existing few-shot learning methods based on meta-learning. Extensive experiments on nine datasets demonstrate the superiority of our method for cross-domain few-shot classification compared with the state of the art. Code is available at https://github.com/youthhoo/AFA_For_Few_shot_learning
现有的基于元学习的方法通过从基类(源领域)训练任务中学习元知识来预测(目标领域)测试任务的新类标签。然而,由于领域之间可能存在较大的领域差异,大多数现有的工作可能无法推广到新的类别。为了解决这个问题,我们提出了一种新的对抗特征增强(AFA)方法来弥补少镜头学习中的领域差距。特征增强的目的是通过最大化域差异来模拟分布变化。在对抗训练中,通过区分增强特征(未见域)和原始特征(见域)来学习域鉴别器,同时最小化域差异以获得最优特征编码器。该方法是一个即插即用的模块,可以很容易地集成到现有的基于元学习的少镜头学习方法中。在9个数据集上进行的大量实验表明,与现有方法相比,我们的方法具有跨域少镜头分类的优越性。代码可从https://github.com/youthhoo/AFA_For_Few_shot_learning获得
{"title":"Adversarial Feature Augmentation for Cross-domain Few-shot Classification","authors":"Yan Hu, A. J. Ma","doi":"10.48550/arXiv.2208.11021","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11021","url":null,"abstract":"Existing methods based on meta-learning predict novel-class labels for (target domain) testing tasks via meta knowledge learned from (source domain) training tasks of base classes. However, most existing works may fail to generalize to novel classes due to the probably large domain discrepancy across domains. To address this issue, we propose a novel adversarial feature augmentation (AFA) method to bridge the domain gap in few-shot learning. The feature augmentation is designed to simulate distribution variations by maximizing the domain discrepancy. During adversarial training, the domain discriminator is learned by distinguishing the augmented features (unseen domain) from the original ones (seen domain), while the domain discrepancy is minimized to obtain the optimal feature encoder. The proposed method is a plug-and-play module that can be easily integrated into existing few-shot learning methods based on meta-learning. Extensive experiments on nine datasets demonstrate the superiority of our method for cross-domain few-shot classification compared with the state of the art. Code is available at https://github.com/youthhoo/AFA_For_Few_shot_learning","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91496337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1