首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
Aberration-Aware Depth-from-Focus Aberration-Aware Depth-from-Focus
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-08 DOI: 10.48550/arXiv.2303.04654
Xinge Yang, Qiang Fu, Mohammed Elhoseiny, W. Heidrich
Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of network models on both synthetic and real-world data. The experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model for different datasets. The code will be available in github.com/vccimaging/Aberration-Aware-Depth-from-Focus.
深度估计的计算机视觉方法通常使用具有理想光学元件的简单相机模型。对于现代机器学习方法,当尝试用模拟数据训练深度网络时,这就产生了一个问题,特别是对于像Depth-from-Focus这样的对焦点敏感的任务。在这项工作中,我们研究了由离轴像差引起的域间隙,它将影响焦堆栈中最佳聚焦帧的决定。然后,我们探索通过畸变感知训练(AAT)弥合这一领域差距。我们的方法涉及一个轻量级网络,该网络模拟不同位置和聚焦距离的透镜像差,然后将其集成到传统的网络训练管道中。我们在合成数据和真实数据上评估网络模型的通用性。实验结果表明,该方案无需对不同数据集的模型进行微调,即可提高深度估计精度。代码可以在github.com/vccimaging/Aberration-Aware-Depth-from-Focus上找到。
{"title":"Aberration-Aware Depth-from-Focus","authors":"Xinge Yang, Qiang Fu, Mohammed Elhoseiny, W. Heidrich","doi":"10.48550/arXiv.2303.04654","DOIUrl":"https://doi.org/10.48550/arXiv.2303.04654","url":null,"abstract":"Computer vision methods for depth estimation usually use simple camera models with idealized optics. For modern machine learning approaches, this creates an issue when attempting to train deep networks with simulated data, especially for focus-sensitive tasks like Depth-from-Focus. In this work, we investigate the domain gap caused by off-axis aberrations that will affect the decision of the best-focused frame in a focal stack. We then explore bridging this domain gap through aberration-aware training (AAT). Our approach involves a lightweight network that models lens aberrations at different positions and focus distances, which is then integrated into the conventional network training pipeline. We evaluate the generality of network models on both synthetic and real-world data. The experimental results demonstrate that the proposed AAT scheme can improve depth estimation accuracy without fine-tuning the model for different datasets. The code will be available in github.com/vccimaging/Aberration-Aware-Depth-from-Focus.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46730395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical Optimization-Derived Learning 层次优化派生学习
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-02-11 DOI: 10.48550/arXiv.2302.05587
Risheng Liu, Xuan Liu, Shangzhi Zeng, Jin Zhang, Yixuan Zhang
In recent years, by utilizing optimization techniques to formulate the propagation of deep model, a variety of so-called Optimization-Derived Learning (ODL) approaches have been proposed to address diverse learning and vision tasks. Although having achieved relatively satisfying practical performance, there still exist fundamental issues in existing ODL methods. In particular, current ODL methods tend to consider model constructing and learning as two separate phases, and thus fail to formulate their underlying coupling and depending relationship. In this work, we first establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. Then we rigorously prove the joint convergence of these two sub-tasks, from the perspectives of both approximation quality and stationary analysis. To our best knowledge, this is the first theoretical guarantee for these two coupled ODL components: optimization and learning. We further demonstrate the flexibility of our framework by applying HODL to challenging learning tasks, which have not been properly addressed by existing ODL methods. Finally, we conduct extensive experiments on both synthetic data and real applications in vision and other learning tasks to verify the theoretical properties and practical performance of HODL in various application scenarios.
近年来,通过利用优化技术来制定深度模型的传播,提出了各种所谓的优化衍生学习(ODL)方法来解决各种学习和视觉任务。现有的ODL方法虽然取得了比较满意的实际性能,但仍然存在一些根本性的问题。特别是,当前的ODL方法倾向于将模型构建和学习视为两个独立的阶段,因此无法表述它们的底层耦合和依赖关系。在这项工作中,我们首先建立了一个新的框架,称为层次ODL (HODL),同时研究了优化衍生模型构建的内在行为及其相应的学习过程。然后从逼近性和平稳性两方面严格证明了这两个子任务的联合收敛性。据我们所知,这是这两个耦合ODL组件的第一个理论保证:优化和学习。通过将HODL应用于具有挑战性的学习任务,我们进一步展示了框架的灵活性,现有的ODL方法还没有适当地解决这些任务。最后,我们在视觉和其他学习任务中进行了大量的合成数据和实际应用实验,以验证HODL在各种应用场景下的理论特性和实际性能。
{"title":"Hierarchical Optimization-Derived Learning","authors":"Risheng Liu, Xuan Liu, Shangzhi Zeng, Jin Zhang, Yixuan Zhang","doi":"10.48550/arXiv.2302.05587","DOIUrl":"https://doi.org/10.48550/arXiv.2302.05587","url":null,"abstract":"In recent years, by utilizing optimization techniques to formulate the propagation of deep model, a variety of so-called Optimization-Derived Learning (ODL) approaches have been proposed to address diverse learning and vision tasks. Although having achieved relatively satisfying practical performance, there still exist fundamental issues in existing ODL methods. In particular, current ODL methods tend to consider model constructing and learning as two separate phases, and thus fail to formulate their underlying coupling and depending relationship. In this work, we first establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. Then we rigorously prove the joint convergence of these two sub-tasks, from the perspectives of both approximation quality and stationary analysis. To our best knowledge, this is the first theoretical guarantee for these two coupled ODL components: optimization and learning. We further demonstrate the flexibility of our framework by applying HODL to challenging learning tasks, which have not been properly addressed by existing ODL methods. Finally, we conduct extensive experiments on both synthetic data and real applications in vision and other learning tasks to verify the theoretical properties and practical performance of HODL in various application scenarios.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48910693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections SceneDreamer:从2D图像集合生成无界3D场景
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-02-02 DOI: 10.48550/arXiv.2302.01330
Zhaoxi Chen, Guangcong Wang, Ziwei Liu
In this work, we present, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds. Project Page is available at https://scene-dreamer.github.io/. Code is available at https://github.com/FrozenBurning/SceneDreamer.
在这项工作中,我们提出了一个无界3D场景的无条件生成模型,该模型可以从随机噪声中合成大规模3D景观。我们的框架仅从野外2D图像集合中学习,没有任何3D注释。核心是一个有原则的学习范式,包括1)高效而富有表现力的3D场景表示,2)生成场景参数化,以及3)可以利用2D图像知识的有效渲染器。我们的方法首先从单纯形噪声生成的高效鸟瞰图(BEV)表示开始,其中包括用于地表高程的高度场和用于详细场景语义的语义场。该BEV场景表示实现了1)以二次复杂度表示3D场景,2)解纠缠的几何和语义,以及3)高效的训练。此外,我们提出了一种新的基于三维位置和场景语义的生成神经哈希网格来参数化潜在空间,旨在编码各种场景的可泛化特征。最后,通过对抗训练从2D图像集合中学习神经体渲染器来生成逼真的图像。大量的实验证明了在生成生动而多样的无界3D世界方面,其有效性和优越性优于最先进的方法。项目页面可访问https://scene-dreamer.github.io/。代码可从https://github.com/FrozenBurning/SceneDreamer获得。
{"title":"SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections","authors":"Zhaoxi Chen, Guangcong Wang, Ziwei Liu","doi":"10.48550/arXiv.2302.01330","DOIUrl":"https://doi.org/10.48550/arXiv.2302.01330","url":null,"abstract":"In this work, we present, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds. Project Page is available at https://scene-dreamer.github.io/. Code is available at https://github.com/FrozenBurning/SceneDreamer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48997194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces 助推器:从镜面和透明表面的图像深度的基准
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-19 DOI: 10.48550/arXiv.2301.08245
Pierluigi Zama Ramirez, Alex Costanzino, F. Tosi, Matteo Poggi, Samuele Salti, S. Mattoccia, L. D. Stefano
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field.
目前,从图像中估计深度在域内精度和泛化方面都取得了突出的结果。然而,我们发现了该领域仍然存在的两个主要挑战:处理非朗伯材料和有效处理高分辨率图像。有目的地,我们提出了一个新的数据集,该数据集包括高分辨率的精确和密集的地面实况标签,以包含几个镜面和透明表面的场景为特征。我们的采集管道利用了一种新颖的深空时立体框架,能够以亚像素精度轻松准确地进行标记。该数据集由在85个不同场景中收集的606个样本组成,每个样本包括高分辨率对(12Mpx)和不平衡立体声对(左:12Mpx,右:1.1Mpx),这是安装不同分辨率传感器的现代移动设备的典型特征。此外,我们还提供了手动注释的材料分割掩模和15K未标记的样本。该数据集由一个训练集和两个测试集组成,后者用于评估立体和单目深度估计网络。我们的实验突出了这一领域的公开挑战和未来的研究方向。
{"title":"Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces","authors":"Pierluigi Zama Ramirez, Alex Costanzino, F. Tosi, Matteo Poggi, Samuele Salti, S. Mattoccia, L. D. Stefano","doi":"10.48550/arXiv.2301.08245","DOIUrl":"https://doi.org/10.48550/arXiv.2301.08245","url":null,"abstract":"Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43552731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dataset Distillation: A Comprehensive Review 数据集蒸馏:综述
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-17 DOI: 10.48550/arXiv.2301.07014
Ruonan Yu, Songhua Liu, Xinchao Wang
Recent success of deep learning is largely attributed to the sheer amount of data used for training deep neural networks. Despite the unprecedented success, the massive data, unfortunately, significantly increases the burden on storage and transmission and further gives rise to a cumbersome model training process. Besides, relying on the raw data for training per se yields concerns about privacy and copyright. To alleviate these shortcomings, dataset distillation (DD), also known as dataset condensation (DC), was introduced and has recently attracted much research attention in the community. Given an original dataset, DD aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset. In this paper, we give a comprehensive review and summary of recent advances in DD and its application. We first introduce the task formally and propose an overall algorithmic framework followed by all existing DD methods. Next, we provide a systematic taxonomy of current methodologies in this area, and discuss their theoretical interconnections. We also present current challenges in DD through extensive empirical studies and envision possible directions for future works.
最近深度学习的成功很大程度上归功于用于训练深度神经网络的大量数据。尽管取得了前所未有的成功,但不幸的是,海量数据大大增加了存储和传输的负担,并进一步导致了繁琐的模型训练过程。此外,依靠原始数据进行培训本身就会引起对隐私和版权的担忧。为了克服这些缺点,数据集蒸馏(DD)也被称为数据集冷凝(DC),近年来引起了业界的广泛关注。给定原始数据集,DD旨在派生一个包含合成样本的小得多的数据集,在此基础上训练的模型产生与原始数据集训练的模型相当的性能。本文对近年来DD及其应用的研究进展进行了综述。我们首先正式介绍了该任务,并提出了一个遵循所有现有DD方法的总体算法框架。接下来,我们对这一领域的当前方法进行了系统的分类,并讨论了它们在理论上的相互联系。我们还通过广泛的实证研究提出了DD当前面临的挑战,并展望了未来工作的可能方向。
{"title":"Dataset Distillation: A Comprehensive Review","authors":"Ruonan Yu, Songhua Liu, Xinchao Wang","doi":"10.48550/arXiv.2301.07014","DOIUrl":"https://doi.org/10.48550/arXiv.2301.07014","url":null,"abstract":"Recent success of deep learning is largely attributed to the sheer amount of data used for training deep neural networks. Despite the unprecedented success, the massive data, unfortunately, significantly increases the burden on storage and transmission and further gives rise to a cumbersome model training process. Besides, relying on the raw data for training per se yields concerns about privacy and copyright. To alleviate these shortcomings, dataset distillation (DD), also known as dataset condensation (DC), was introduced and has recently attracted much research attention in the community. Given an original dataset, DD aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset. In this paper, we give a comprehensive review and summary of recent advances in DD and its application. We first introduce the task formally and propose an overall algorithmic framework followed by all existing DD methods. Next, we provide a systematic taxonomy of current methodologies in this area, and discuss their theoretical interconnections. We also present current challenges in DD through extensive empirical studies and envision possible directions for future works.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47087034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Pixel-Perfect Structure-From-Motion With Featuremetric Refinement. 通过特征度量细化实现运动中的像素完美结构。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-16 DOI: 10.1109/TPAMI.2023.3237269
Paul-Edouard Sarlin, Philipp Lindenberger, Viktor Larsson, Marc Pollefeys

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular Structure-from-Motion software COLMAP.

寻找可在多个视图中重复的局部特征是稀疏三维重建的基石。经典的图像匹配范例对每张图像的关键点进行一次性检测,这可能会产生定位不清的特征,并对最终几何图形产生较大误差。在本文中,我们通过对多个视图的低级图像信息进行直接配准,完善了 "从运动看结构 "的两个关键步骤:我们首先在进行任何几何估算之前调整初始关键点位置,然后作为后处理完善点和摄像机姿势。这种细化能抵御较大的检测噪声和外观变化,因为它根据神经网络预测的密集特征优化了特征度误差。这大大提高了摄像机姿势和场景几何的准确性,适用于各种关键点检测器、具有挑战性的观察条件和现成的深度特征。我们的系统可轻松扩展到大型图像集合,从而实现像素完美的大规模众包定位。我们的代码可在 https://github.com/cvg/pixel-perfect-sfm 网站上公开获取,作为广受欢迎的运动结构软件 COLMAP 的附加组件。
{"title":"Pixel-Perfect Structure-From-Motion With Featuremetric Refinement.","authors":"Paul-Edouard Sarlin, Philipp Lindenberger, Viktor Larsson, Marc Pollefeys","doi":"10.1109/TPAMI.2023.3237269","DOIUrl":"10.1109/TPAMI.2023.3237269","url":null,"abstract":"<p><p>Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale. Our code is publicly available at https://github.com/cvg/pixel-perfect-sfm as an add-on to the popular Structure-from-Motion software COLMAP.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9252417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPTS v2: Single-Point Scene Text Spotting SPTS v2:单点场景文本识别
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-04 DOI: 10.48550/arXiv.2301.01635
Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Ji Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at: https://github.com/Yuliang-Liu/SPTSv2.
端到端场景文本识别由于文本检测和识别之间的内在协同作用而取得了重大进展。以往的方法通常以手工标注为前提,如水平矩形、旋转矩形、四边形、多边形等,这比单点标注要昂贵得多。我们的新框架SPTS v2允许我们使用单点注释训练高性能的文本识别模型。SPTS v2保留了具有实例分配解码器(IAD)的自回归转换器的优点,通过顺序地预测同一预测序列内所有文本实例的中心点,而具有并行识别解码器(PRD)的文本并行识别,这大大降低了对序列长度的要求。这两个解码器具有相同的参数,通过简单而有效的信息传输过程交互连接,传递梯度和信息。在各种现有基准数据集上的综合实验表明,SPTS v2可以用更少的参数胜过以前最先进的单点文本观测者,同时实现19倍的推理速度。在我们的SPTS v2框架的背景下,我们的实验表明,与其他表示相比,单点表示在场景文本识别中具有潜在的偏好。这种尝试为超越现有范例领域的场景文本识别应用提供了重要的机会。代码可从https://github.com/Yuliang-Liu/SPTSv2获得。
{"title":"SPTS v2: Single-Point Scene Text Spotting","authors":"Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Ji Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin","doi":"10.48550/arXiv.2301.01635","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01635","url":null,"abstract":"End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19× faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at: https://github.com/Yuliang-Liu/SPTSv2.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43034815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generalizable Black-Box Adversarial Attack with Meta Learning 基于元学习的可推广黑匣子对抗攻击
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.48550/arXiv.2301.00364
Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, Jingyi Zhang, Yanbo Fan, Yujiu Yang
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments. The source code is available at https://github.com/SCLBD/MCG-Blackbox.
在黑箱对抗性攻击的场景中,目标模型的参数是未知的,攻击者的目标是在查询预算下基于查询反馈找到成功的对抗性扰动。由于反馈信息有限,现有的基于查询的黑盒攻击方法通常需要许多查询来攻击每个良性示例。为了降低查询成本,我们建议在历史攻击中利用反馈信息,称为示例级对抗性可转移性。具体来说,通过将对每个良性示例的攻击视为一项任务,我们通过训练元生成器来产生以良性示例为条件的扰动,从而开发了一个元学习框架。当攻击一个新的良性示例时,可以根据新任务的反馈信息以及一些历史攻击来快速微调元生成器,以产生有效的扰动。此外,由于元训练过程消耗了许多查询来学习可推广生成器,我们利用模型级的对抗性可转移性在白盒代理模型上训练元生成器,然后将其转移以帮助攻击目标模型。所提出的具有两种类型的对抗性可转移性的框架可以自然地与任何现成的基于查询的攻击方法相结合,以提高其性能,这已经通过大量实验得到了验证。源代码位于https://github.com/SCLBD/MCG-Blackbox.
{"title":"Generalizable Black-Box Adversarial Attack with Meta Learning","authors":"Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, Jingyi Zhang, Yanbo Fan, Yujiu Yang","doi":"10.48550/arXiv.2301.00364","DOIUrl":"https://doi.org/10.48550/arXiv.2301.00364","url":null,"abstract":"In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments. The source code is available at https://github.com/SCLBD/MCG-Blackbox.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45976317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning Implicit Functions for Dense 3D Shape Correspondence of Generic Objects 通用对象密集三维形状对应的隐式函数学习
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-12-29 DOI: 10.48550/arXiv.2212.14276
Feng Liu, Xiaoming Liu
The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
本文的目的是以无监督的方式学习拓扑变化的通用对象的密集三维形状对应关系。传统的隐式函数在给定形状潜在码的情况下估计3D点的占用率。相反,我们新颖的隐函数产生了一个概率嵌入来表示零件嵌入空间中的每个3D点。假设对应点在嵌入空间中相似,我们通过从部分嵌入向量到对应3D点的逆函数映射来实现密集对应。这两个函数都是与几个有效的和不确定性感知的损失函数联合学习的,以实现我们的假设,以及编码器生成形状潜在代码。在推理过程中,如果用户选择源形状上的任意点,我们的算法可以自动生成置信度得分,指示目标形状上是否存在对应关系,以及对应的语义点(如果存在)。这样的机制本质上有利于具有不同部件构造的人造物体。通过无监督的三维语义对应和形状分割,证明了该方法的有效性。
{"title":"Learning Implicit Functions for Dense 3D Shape Correspondence of Generic Objects","authors":"Feng Liu, Xiaoming Liu","doi":"10.48550/arXiv.2212.14276","DOIUrl":"https://doi.org/10.48550/arXiv.2212.14276","url":null,"abstract":"The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a probabilistic embedding to represent each 3D point in a part embedding space. Assuming the corresponding points are similar in the embedding space, we implement dense correspondence through an inverse function mapping from the part embedding vector to a corresponded 3D point. Both functions are jointly learned with several effective and uncertainty-aware loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45446877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Regularized Optimal Transport Layers for Generalized Global Pooling Operations 广义全局池化操作的正则化最优传输层
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06339
Hongteng Xu, Minjie Cheng
Global pooling is one of the most significant operations in many machine learning models and tasks, which works for information fusion and structured data (like sets and graphs) representation. However, without solid mathematical fundamentals, its practical implementations often depend on empirical mechanisms and thus lead to sub-optimal, even unsatisfactory performance. In this work, we develop a novel and generalized global pooling framework through the lens of optimal transport. The proposed framework is interpretable from the perspective of expectation-maximization. Essentially, it aims at learning an optimal transport across sample indices and feature dimensions, making the corresponding pooling operation maximize the conditional expectation of input data. We demonstrate that most existing pooling methods are equivalent to solving a regularized optimal transport (ROT) problem with different specializations, and more sophisticated pooling operations can be implemented by hierarchically solving multiple ROT problems. Making the parameters of the ROT problem learnable, we develop a family of regularized optimal transport pooling (ROTP) layers. We implement the ROTP layers as a new kind of deep implicit layer. Their model architectures correspond to different optimization algorithms. We test our ROTP layers in several representative set-level machine learning scenarios, including multi-instance learning (MIL), graph classification, graph set representation, and image classification. Experimental results show that applying our ROTP layers can reduce the difficulty of the design and selection of global pooling - our ROTP layers may either imitate some existing global pooling methods or lead to some new pooling layers fitting data better. The code is available at https://github.com/SDS-Lab/ROT-Pooling.
全局池是许多机器学习模型和任务中最重要的操作之一,用于信息融合和结构化数据(如集合和图)表示。然而,如果没有坚实的数学基础,其实际实现往往依赖于经验机制,从而导致次优甚至不令人满意的性能。在这项工作中,我们通过最优运输的视角,开发了一个新的、广义的全球联营框架。所提出的框架可以从期望最大化的角度进行解释。本质上,它旨在学习样本指数和特征维度之间的最优传输,使相应的池化操作最大化输入数据的条件期望。我们证明了大多数现有的池化方法相当于解决具有不同专业化的正则化最优传输(ROT)问题,并且可以通过分层解决多个ROT问题来实现更复杂的池化操作。为了使ROT问题的参数可学习,我们开发了一组正则化最优传输池(ROTP)层。我们将ROTP层实现为一种新的深层隐式层。它们的模型体系结构对应于不同的优化算法。我们在几个具有代表性的集级机器学习场景中测试了我们的ROTP层,包括多实例学习(MIL)、图分类、图集表示和图像分类。实验结果表明,应用我们的ROTP层可以降低全局池的设计和选择的难度——我们的ROPP层可以模仿一些现有的全局池方法,也可以产生一些新的池层来更好地拟合数据。代码可在https://github.com/SDS-Lab/ROT-Pooling.
{"title":"Regularized Optimal Transport Layers for Generalized Global Pooling Operations","authors":"Hongteng Xu, Minjie Cheng","doi":"10.48550/arXiv.2212.06339","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06339","url":null,"abstract":"Global pooling is one of the most significant operations in many machine learning models and tasks, which works for information fusion and structured data (like sets and graphs) representation. However, without solid mathematical fundamentals, its practical implementations often depend on empirical mechanisms and thus lead to sub-optimal, even unsatisfactory performance. In this work, we develop a novel and generalized global pooling framework through the lens of optimal transport. The proposed framework is interpretable from the perspective of expectation-maximization. Essentially, it aims at learning an optimal transport across sample indices and feature dimensions, making the corresponding pooling operation maximize the conditional expectation of input data. We demonstrate that most existing pooling methods are equivalent to solving a regularized optimal transport (ROT) problem with different specializations, and more sophisticated pooling operations can be implemented by hierarchically solving multiple ROT problems. Making the parameters of the ROT problem learnable, we develop a family of regularized optimal transport pooling (ROTP) layers. We implement the ROTP layers as a new kind of deep implicit layer. Their model architectures correspond to different optimization algorithms. We test our ROTP layers in several representative set-level machine learning scenarios, including multi-instance learning (MIL), graph classification, graph set representation, and image classification. Experimental results show that applying our ROTP layers can reduce the difficulty of the design and selection of global pooling - our ROTP layers may either imitate some existing global pooling methods or lead to some new pooling layers fitting data better. The code is available at https://github.com/SDS-Lab/ROT-Pooling.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44449444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1