首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
Isolating Signals in Passive Non-Line-of-Sight Imaging using Spectral Content. 利用光谱内容隔离被动非视线成像中的信号
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-08-02 DOI: 10.1109/TPAMI.2023.3301336
Connor Hashemi, Rafael Avelar, James Leger

In real-life passive non-line-of-sight (NLOS) imaging there is an overwhelming amount of undesired scattered radiance, called clutter, that impedes reconstruction of the desired NLOS scene. This paper explores using the spectral domain of the scattered light field to separate the desired scattered radiance from the clutter. We propose two techniques: The first separates the multispectral scattered radiance into a collection of objects each with their own uniform color. The objects which correspond to clutter can then be identified and removed based on how well they can be reconstructed using NLOS imaging algorithms. This technique requires very few priors and uses off-the-shelf algorithms. For the second technique, we derive and solve a convex optimization problem assuming we know the desired signal's spectral content. This method is quicker and can be performed with fewer spectral measurements. We demonstrate both techniques using realistic scenarios. In the presence of clutter that is 50 times stronger than the desired signal, the proposed reconstruction of the NLOS scene is 23 times more accurate than typical reconstructions and 5 times more accurate than using the leading clutter rejection method.

在现实生活中的无源非视距(NLOS)成像中,有大量不需要的散射辐射,即所谓的杂波,阻碍了所需的 NLOS 场景的重建。本文探讨了如何利用散射光场的光谱域将所需的散射辐射从杂波中分离出来。我们提出了两种技术:第一种技术是将多光谱散射辐射分离成一系列物体,每个物体都有自己的统一颜色。然后,可以根据使用无损观测成像算法重建的效果,识别并移除与杂波相对应的物体。这种技术只需很少的先验条件,并使用现成的算法。对于第二种技术,我们假定知道所需信号的光谱内容,推导并解决一个凸优化问题。这种方法速度更快,只需较少的光谱测量即可完成。我们利用现实场景演示了这两种技术。在杂波比所需信号强 50 倍的情况下,所提出的 NLOS 场景重建比典型重建精确 23 倍,比使用主要杂波剔除方法精确 5 倍。
{"title":"Isolating Signals in Passive Non-Line-of-Sight Imaging using Spectral Content.","authors":"Connor Hashemi, Rafael Avelar, James Leger","doi":"10.1109/TPAMI.2023.3301336","DOIUrl":"10.1109/TPAMI.2023.3301336","url":null,"abstract":"<p><p>In real-life passive non-line-of-sight (NLOS) imaging there is an overwhelming amount of undesired scattered radiance, called clutter, that impedes reconstruction of the desired NLOS scene. This paper explores using the spectral domain of the scattered light field to separate the desired scattered radiance from the clutter. We propose two techniques: The first separates the multispectral scattered radiance into a collection of objects each with their own uniform color. The objects which correspond to clutter can then be identified and removed based on how well they can be reconstructed using NLOS imaging algorithms. This technique requires very few priors and uses off-the-shelf algorithms. For the second technique, we derive and solve a convex optimization problem assuming we know the desired signal's spectral content. This method is quicker and can be performed with fewer spectral measurements. We demonstrate both techniques using realistic scenarios. In the presence of clutter that is 50 times stronger than the desired signal, the proposed reconstruction of the NLOS scene is 23 times more accurate than typical reconstructions and 5 times more accurate than using the leading clutter rejection method.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digging Into Uncertainty-based Pseudo-label for Robust Stereo Matching 基于不确定性的伪标签鲁棒立体匹配研究
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-31 DOI: 10.48550/arXiv.2307.16509
Zhelun Shen, Xibin Song, Yuchao Dai, Dingfu Zhou, Zhibo Rao, Liangjun Zhang
Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose to dig into uncertainty estimation for robust stereo matching. Specifically, to balance the disparity distribution, we employ a pixel-level uncertainty estimation to adaptively adjust the next stage disparity searching space, in this way driving the network progressively prune out the space of unlikely correspondences. Then, to solve the limited ground truth data, an uncertainty-based pseudo-label is proposed to adapt the pre-trained model to the new domain, where pixel-level and area-level uncertainty estimation are proposed to filter out the high-uncertainty pixels of predicted disparity maps and generate sparse while reliable pseudo-labels to align the domain gap. Experimentally, our method shows strong cross-domain, adapt, and joint generalization and obtains 1st place on the stereo task of Robust Vision Challenge 2020. Additionally, our uncertainty-based pseudo-labels can be extended to train monocular depth estimation networks in an unsupervised way and even achieves comparable performance with the supervised methods.
由于多个数据集之间的域差异和不平衡的视差分布,当前的立体匹配方法通常局限于特定的数据集,而难以推广到其他数据集。这种领域转移问题通常通过对昂贵的目标领域地面实况数据进行大量调整来解决,而这些数据在实际环境中无法轻易获得。在本文中,我们建议深入研究鲁棒立体声匹配的不确定性估计。具体来说,为了平衡视差分布,我们采用像素级的不确定性估计来自适应地调整下一阶段的视差搜索空间,以这种方式驱动网络逐渐修剪出不太可能的对应空间。然后,为了解决有限的地面实况数据,提出了一种基于不确定性的伪标签,以使预训练的模型适应新的域,其中提出了像素级和区域级的不确定性估计,以滤除预测视差图的高不确定性像素,并生成稀疏而可靠的伪标签来对齐域间隙。在实验上,我们的方法表现出强大的跨域、自适应和联合泛化能力,并在2020年鲁棒视觉挑战的立体任务中获得第一名。此外,我们基于不确定性的伪标签可以扩展到以无监督的方式训练单目深度估计网络,甚至可以实现与监督方法相当的性能。
{"title":"Digging Into Uncertainty-based Pseudo-label for Robust Stereo Matching","authors":"Zhelun Shen, Xibin Song, Yuchao Dai, Dingfu Zhou, Zhibo Rao, Liangjun Zhang","doi":"10.48550/arXiv.2307.16509","DOIUrl":"https://doi.org/10.48550/arXiv.2307.16509","url":null,"abstract":"Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose to dig into uncertainty estimation for robust stereo matching. Specifically, to balance the disparity distribution, we employ a pixel-level uncertainty estimation to adaptively adjust the next stage disparity searching space, in this way driving the network progressively prune out the space of unlikely correspondences. Then, to solve the limited ground truth data, an uncertainty-based pseudo-label is proposed to adapt the pre-trained model to the new domain, where pixel-level and area-level uncertainty estimation are proposed to filter out the high-uncertainty pixels of predicted disparity maps and generate sparse while reliable pseudo-labels to align the domain gap. Experimentally, our method shows strong cross-domain, adapt, and joint generalization and obtains 1st place on the stereo task of Robust Vision Challenge 2020. Additionally, our uncertainty-based pseudo-labels can be extended to train monocular depth estimation networks in an unsupervised way and even achieves comparable performance with the supervised methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47838193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervision by Denoising. 去噪监督
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-28 DOI: 10.1109/TPAMI.2023.3299789
Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias

Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose "supervision by denoising" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging-anatomical brain reconstruction (3D) and cortical parcellation (2D)-to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. Our code available at https://github.com/seannz/sud.

基于学习的图像重建模型,如基于 U-Net 的模型,需要大量的标记图像集才能保证良好的泛化效果。然而,在某些成像领域,由于获取成本的原因,具有像素或体素级标签精度的标签数据非常稀缺。在医学成像等领域,这一问题更加严重,因为这些领域没有单一的地面实况标签,导致标签的重复变异性很大。因此,通过从有标签和无标签的示例中学习(称为半监督学习)来训练重建网络,使其具有更好的泛化能力,是一个具有实际意义和理论意义的问题。然而,用于图像重建的传统半监督学习方法往往需要针对特定的成像问题手工制作一个可微分的正则化器,这可能会非常耗时。在这项工作中,我们提出了 "去噪监督"(SUD),这是一种利用自身去噪输出作为标签对重建模型进行监督的框架。SUD 在时空去噪框架下统一了随机平均和空间去噪技术,并在半监督的优化框架中交替使用去噪和模型权重更新步骤。作为应用实例,我们将 SUD 应用于生物医学成像中的两个问题--大脑解剖重建(三维)和皮层解析(二维)--证明了与纯监督和集合基线相比,SUD 在重建方面的显著改进。我们的代码见 https://github.com/seannz/sud。
{"title":"Supervision by Denoising.","authors":"Sean I Young, Adrian V Dalca, Enzo Ferrante, Polina Golland, Christopher A Metzler, Bruce Fischl, Juan Eugenio Iglesias","doi":"10.1109/TPAMI.2023.3299789","DOIUrl":"10.1109/TPAMI.2023.3299789","url":null,"abstract":"<p><p>Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose \"supervision by denoising\" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging-anatomical brain reconstruction (3D) and cortical parcellation (2D)-to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. Our code available at https://github.com/seannz/sud.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9958188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Zodiacal Light For Spaceborne Calibration Of Polarimetric Imagers. 使用十二生肖光进行星载偏振成像仪校准。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-27 DOI: 10.1109/TPAMI.2023.3299526
Or Avitan, Yoav Y Schechner, Ehud Behar

We propose that spaceborne polarimetric imagers can be calibrated, or self-calibrated using zodiacal light (ZL). ZL is created by a cloud of interplanetary dust particles. It has a significant degree of polarization in a wide field of view. From space, ZL is unaffected by terrestrial disturbances. ZL is insensitive to the camera location, so it is suited for simultaneous cross-calibration of satellite constellations. ZL changes on a scale of months, thus being a quasi-constant target in realistic calibration sessions. We derive a forward model for polarimetric image formation. Based on it, we formulate an inverse problem for polarimetric calibration and self-calibration, as well as an algorithm for the solution. The methods here are demonstrated in simulations. Towards these simulations, we render polarized images of the sky, including ZL from space, polarimetric disturbances, and imaging noise.

我们提出,星载偏振成像仪可以使用黄道光(ZL)进行校准或自校准。ZL是由一团行星际尘埃粒子形成的。它在宽视场中具有显著的偏振度。从太空看,ZL不受地面干扰的影响。ZL对相机位置不敏感,适合于卫星星座的同时交叉校准。ZL以月为单位变化,因此在实际校准过程中是一个准恒定目标。我们推导了极化成像的前向模型。在此基础上,我们提出了极化校准和自校准的逆问题,并给出了求解算法。这里的方法在模拟中得到了演示。为了进行这些模拟,我们绘制了天空的偏振图像,包括来自太空的ZL、偏振扰动和成像噪声。
{"title":"Using Zodiacal Light For Spaceborne Calibration Of Polarimetric Imagers.","authors":"Or Avitan, Yoav Y Schechner, Ehud Behar","doi":"10.1109/TPAMI.2023.3299526","DOIUrl":"10.1109/TPAMI.2023.3299526","url":null,"abstract":"<p><p>We propose that spaceborne polarimetric imagers can be calibrated, or self-calibrated using zodiacal light (ZL). ZL is created by a cloud of interplanetary dust particles. It has a significant degree of polarization in a wide field of view. From space, ZL is unaffected by terrestrial disturbances. ZL is insensitive to the camera location, so it is suited for simultaneous cross-calibration of satellite constellations. ZL changes on a scale of months, thus being a quasi-constant target in realistic calibration sessions. We derive a forward model for polarimetric image formation. Based on it, we formulate an inverse problem for polarimetric calibration and self-calibration, as well as an algorithm for the solution. The methods here are demonstrated in simulations. Towards these simulations, we render polarized images of the sky, including ZL from space, polarimetric disturbances, and imaging noise.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9885615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Motion Generation: A Survey 人体动作生成:调查
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-20 DOI: 10.48550/arXiv.2307.10894
Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang
Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.
人体运动生成旨在生成自然的人体姿势序列,在现实世界的应用中显示出巨大的潜力。近来,运动数据采集技术和生成方法取得了长足进步,为人类运动生成领域日益增长的兴趣奠定了基础。该领域的大部分研究都集中在根据条件信号(如文本、音频和场景背景)生成人体运动。虽然近年来取得了重大进展,但由于人体运动的复杂性及其与条件信号之间的隐含关系,这项任务仍然面临着挑战。在本调查报告中,我们对人类运动生成进行了全面的文献综述,据我们所知,这在该领域尚属首次。我们首先介绍了人体运动和生成模型的背景,然后研究了三个主流子任务的代表性方法:文本条件、音频条件和场景条件人体运动生成。此外,我们还概述了常见的数据集和评估指标。最后,我们讨论了尚未解决的问题,并概述了潜在的未来研究方向。我们希望这份调查报告能为社会各界提供对这一快速发展领域的全面了解,并启发新的想法,以解决悬而未决的挑战。
{"title":"Human Motion Generation: A Survey","authors":"Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang","doi":"10.48550/arXiv.2307.10894","DOIUrl":"https://doi.org/10.48550/arXiv.2307.10894","url":null,"abstract":"Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"63 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139357487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Count-Free Single-Photon 3D Imaging with Race Logic 无计数单光子三维成像与种族逻辑
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04924
A. Ingle, David Maier
Single-photon cameras (SPCs) have emerged as a promising new technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by precisely capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Can we estimate distances without explicitly storing photon counts? Yes-here we present an online approach for distance estimation suitable for resource-constrained settings with limited bandwidth, memory and compute. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms as opposed to conventional equi-width histograms. Equi-depth histograms are a more succinct representation for "peaky" distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another k-quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide at least an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional histogram-based processing methods.
单光子相机(SPCs)已成为一种有前途的高分辨率3D成像新技术。单光子3D相机通过精确捕捉每个相机像素处单个光子的到达来确定激光脉冲的往返时间。构建光子时间戳直方图是单光子3D相机的基本操作。然而,像素内直方图处理的计算成本很高,并且每个像素需要大量的内存。数字化并将光子时间戳传输到非传感器直方图模块是带宽和功耗高的。我们能在不明确存储光子计数的情况下估计距离吗?是的,在这里,我们提出了一种适合于带宽、内存和计算有限的资源约束设置的在线距离估计方法。我们的方法的两个关键组成部分是(a)使用竞争逻辑处理光子流,它将光子数据保持在延迟域,以及(b)构建无计数等深度直方图,而不是传统的等宽度直方图。等深度直方图是“峰值”分布的更简洁的表示,例如由表面反射的激光脉冲的SPC像素获得的分布。我们的方法使用收敛于分布的中位数(或者更一般地说,收敛于另一个k-分位数)的binner元素。我们级联多个桶来形成一个等深度的直方图,产生多桶直方图。我们的评估表明,该方法可以在带宽和功耗方面至少降低一个数量级,同时保持与传统的基于直方图的处理方法相似的距离重建精度。
{"title":"Count-Free Single-Photon 3D Imaging with Race Logic","authors":"A. Ingle, David Maier","doi":"10.48550/arXiv.2307.04924","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04924","url":null,"abstract":"Single-photon cameras (SPCs) have emerged as a promising new technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by precisely capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Can we estimate distances without explicitly storing photon counts? Yes-here we present an online approach for distance estimation suitable for resource-constrained settings with limited bandwidth, memory and compute. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms as opposed to conventional equi-width histograms. Equi-depth histograms are a more succinct representation for \"peaky\" distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another k-quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide at least an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional histogram-based processing methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43712581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Scalable Multi-View Reconstruction of Geometry and Materials 面向几何和材料的可扩展多视图重建
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-06 DOI: 10.48550/arXiv.2306.03747
Carolin Schmitt, B. Anti'c, Andrei Neculai, J. Lee, Andreas Geiger
In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.
在本文中,我们提出了一种新的方法,用于联合恢复超过物体尺度的3D场景的相机姿态、物体几何形状和空间变化的双向反射分布函数(svBRDF),因此无法用静止的光台捕捉。输入是由移动手持捕获系统捕获的高分辨率RGB-D图像,该系统具有用于主动照明的点光源。与以前通过手持扫描仪联合估计几何结构和材料的工作相比,我们使用单一目标函数来解决这个问题,该函数可以使用现成的基于梯度的求解器来最小化。为了促进对大量观测视图和优化变量的可扩展性,我们引入了一种分布式优化算法,该算法可以重建场景的基于2.5D关键帧的表示。一种新颖的多视图一致性正则化器有效地同步相邻关键帧,使得局部优化结果允许无缝集成到全局一致的3D模型中。我们对配方中每个成分的重要性进行了研究,并表明我们的方法与基线相比是有利的。我们进一步证明,我们的方法准确地重建了各种物体和材料,并允许扩展到空间上更大的场景。我们认为,这项工作代表着朝着使手持扫描仪的几何形状和材料估计可扩展迈出了重要一步。
{"title":"Towards Scalable Multi-View Reconstruction of Geometry and Materials","authors":"Carolin Schmitt, B. Anti'c, Andrei Neculai, J. Lee, Andreas Geiger","doi":"10.48550/arXiv.2306.03747","DOIUrl":"https://doi.org/10.48550/arXiv.2306.03747","url":null,"abstract":"In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. To facilitate scalability to large numbers of observation views and optimization variables, we introduce a distributed optimization algorithm that reconstructs 2.5D keyframe-based representations of the scene. A novel multi-view consistency regularizer effectively synchronizes neighboring keyframes such that the local optimization results allow for seamless integration into a globally consistent 3D model. We provide a study on the importance of each component in our formulation and show that our method compares favorably to baselines. We further demonstrate that our method accurately reconstructs various objects and materials and allows for expansion to spatially larger scenes. We believe that this work represents a significant step towards making geometry and material estimation from hand-held scanners scalable.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45725916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN 快速snn:基于转换量化神经网络的快速尖峰神经网络
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2305.19868
Yang‐Zhi Hu, Qian Zheng, Xudong Jiang, Gang Pan
Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spiking function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: https://github.com/yangfan-hu/Fast-SNN.
脉冲神经网络(SNNs)由于其事件驱动的表征,在计算和能源效率方面比传统的人工神经网络(ann)具有优势。snn还用加法代替了ann中的权值乘法,这更节能,计算强度更低。然而,由于脉冲函数的离散性,深度snn的训练仍然是一个挑战。规避这一挑战的一种流行方法是ANN-to-SNN转换。然而,由于量化误差和累积误差,通常需要大量的时间步长(高推断延迟)才能达到高性能,这就抵消了SNN的优势。为此,本文提出了以低延迟实现高性能的Fast-SNN。我们展示了snn的时间量化和ANN的空间量化之间的等效映射,并在此基础上将量化误差最小化转移到量化的ANN训练中。随着量化误差的最小化,我们表明序列误差是累积误差的主要原因,通过引入带符号的中频神经元模型和分层微调机制来解决这一问题。我们的方法在各种计算机视觉任务上实现了最先进的性能和低延迟,包括图像分类,目标检测和语义分割。代码可在https://github.com/yangfan-hu/Fast-SNN获得。
{"title":"Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN","authors":"Yang‐Zhi Hu, Qian Zheng, Xudong Jiang, Gang Pan","doi":"10.48550/arXiv.2305.19868","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19868","url":null,"abstract":"Spiking neural networks (SNNs) have shown advantages in computation and energy efficiency over traditional artificial neural networks (ANNs) thanks to their event-driven representations. SNNs also replace weight multiplications in ANNs with additions, which are more energy-efficient and less computationally intensive. However, it remains a challenge to train deep SNNs due to the discrete spiking function. A popular approach to circumvent this challenge is ANN-to-SNN conversion. However, due to the quantization error and accumulating error, it often requires lots of time steps (high inference latency) to achieve high performance, which negates SNN's advantages. To this end, this paper proposes Fast-SNN that achieves high performance with low latency. We demonstrate the equivalent mapping between temporal quantization in SNNs and spatial quantization in ANNs, based on which the minimization of the quantization error is transferred to quantized ANN training. With the minimization of the quantization error, we show that the sequential error is the primary cause of the accumulating error, which is addressed by introducing a signed IF neuron model and a layer-wise fine-tuning mechanism. Our method achieves state-of-the-art performance and low latency on various computer vision tasks, including image classification, object detection, and semantic segmentation. Codes are available at: https://github.com/yangfan-hu/Fast-SNN.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46962526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TextSLAM: Visual SLAM with Semantic Planar Text Features TextSLAM:具有语义平面文本特征的可视化SLAM
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-17 DOI: 10.48550/arXiv.2305.10029
Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei, Wenxian Yu
We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of text objects, the SLAM system becomes more accurate and robust even under challenging conditions such as image blurring, large viewpoint changes, and significant illumination variations (day and night). We tested our method in various scenes with the ground truth data. The results show that integrating texture features leads to a more superior SLAM system that can match images across day and night. The reconstructed semantic 3D text map could be useful for navigation and scene understanding in robotic and mixed reality applications. (Project page: https://github.com/SJTU-ViSYS/TextSLAM.
我们提出了一种新的视觉SLAM方法,该方法通过充分探索文本对象的几何和语义先验,将文本对象视为语义特征,从而紧密地集成文本对象。文本对象被建模为纹理丰富的平面补丁,其语义被实时提取和更新以获得更好的数据关联。随着对文本对象局部平面特征和语义的充分探索,即使在图像模糊、大的视点变化和显著的光照变化(白天和晚上)等具有挑战性的条件下,SLAM系统也变得更加准确和稳健。我们用地面实况数据在各种场景中测试了我们的方法。结果表明,集成纹理特征可以获得更优越的SLAM系统,该系统可以匹配昼夜图像。重建的语义3D文本地图可用于机器人和混合现实应用中的导航和场景理解。(项目页面:https://github.com/SJTU-ViSYS/TextSLAM.
{"title":"TextSLAM: Visual SLAM with Semantic Planar Text Features","authors":"Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei, Wenxian Yu","doi":"10.48550/arXiv.2305.10029","DOIUrl":"https://doi.org/10.48550/arXiv.2305.10029","url":null,"abstract":"We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of text objects, the SLAM system becomes more accurate and robust even under challenging conditions such as image blurring, large viewpoint changes, and significant illumination variations (day and night). We tested our method in various scenes with the ground truth data. The results show that integrating texture features leads to a more superior SLAM system that can match images across day and night. The reconstructed semantic 3D text map could be useful for navigation and scene understanding in robotic and mixed reality applications. (Project page: https://github.com/SJTU-ViSYS/TextSLAM.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47419810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation 语义图像分割的领域自适应和可泛化网络架构和训练策略
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-26 DOI: 10.48550/arXiv.2304.13615
Lukas Hoyer, Dengxin Dai, L. Gool
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.
无监督域自适应(UDA)和域泛化(DG)使在源域上训练的机器学习模型能够在未标记甚至看不见的目标域上表现良好。由于以前的UDA&DG语义分割方法大多基于过时的网络,我们对更新的架构进行了基准测试,揭示了Transformers的潜力,并为UDA&DG设计了DAFormer网络。它通过三种训练策略来避免对源域的过度拟合:(1)稀有类采样减轻了对常见源域类的偏见,(2)事物类ImageNet特征距离和(3)学习率预热促进了ImageNet预训练的特征转移。由于UDA和DG通常占用GPU内存,因此以前的大多数方法都会缩小或裁剪图像。然而,低分辨率预测往往无法保留精细的细节,而用裁剪图像训练的模型在捕捉长距离、领域鲁棒的上下文信息方面做得不够。因此,我们提出了一种用于UDA&DG的多分辨率框架HRDA,它结合了小的高分辨率裁剪的优势来保留精细的分割细节,以及大的低分辨率裁剪的优点来捕获具有学习尺度注意力的长程上下文依赖性。DAFormer和HRDA在5个不同的基准上将最先进的UDA和DG显著提高了10 mIoU以上。该实施可在https://github.com/lhoyer/HRDA.
{"title":"Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation","authors":"Lukas Hoyer, Dengxin Dai, L. Gool","doi":"10.48550/arXiv.2304.13615","DOIUrl":"https://doi.org/10.48550/arXiv.2304.13615","url":null,"abstract":"Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47110145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1