首页 > 最新文献

IEEE transactions on pattern analysis and machine intelligence最新文献

英文 中文
Towards Effective Causal Partitioning by Edge Cutting of Adjoint Graph. 通过邻接图的切边实现有效因果分割
Pub Date : 2024-07-30 DOI: 10.1109/TPAMI.2024.3435503
Hao Zhang, Yixin Ren, Yewei Xia, Shuigeng Zhou, Jihong Guan

Causal partitioning is an effective approach for causal discovery based on the divide-and-conquer strategy. Up to now, various heuristic methods based on conditional independence (CI) tests have been proposed for causal partitioning. However, most of these methods fail to achieve satisfactory partitioning without violating d-separation, leading to poor inference performance. In this work, we transform causal partitioning into an alternative problem that can be more easily solved. Concretely, we first construct a superstructure G of the true causal graph GT by performing a set of low-order CI tests on the observed data D. Then, we leverage point-line duality to obtain a graph GA adjoint to G. We show that the solution of minimizing edge-cut ratio on GA can lead to a valid causal partitioning with smaller causal-cut ratio on G and without violating d-separation. We design an efficient algorithm to solve this problem. Extensive experiments show that the proposed method can achieve significantly better causal partitioning without violating d-separation than the existing methods. The source code and data are available at https://github.com/hzsiat/CPA.

因果分割是一种基于分而治之策略的有效因果发现方法。迄今为止,已有多种基于条件独立性(CI)检验的启发式方法被提出用于因果分割。然而,这些方法大多无法在不违反 d 分离的情况下实现令人满意的划分,从而导致推理效果不佳。在这项工作中,我们将因果分割转化为另一个更容易解决的问题。具体来说,我们首先通过对观测数据 D 进行一系列低阶 CI 检验,构建出真实因果图 GT 的上层结构 G。然后,我们利用点线对偶性得到与 G 相邻的图 GA。我们设计了一种高效算法来解决这个问题。大量实验表明,与现有方法相比,所提出的方法能在不违反 d 分离的情况下实现更好的因果分割。源代码和数据可在 https://github.com/hzsiat/CPA 上获取。
{"title":"Towards Effective Causal Partitioning by Edge Cutting of Adjoint Graph.","authors":"Hao Zhang, Yixin Ren, Yewei Xia, Shuigeng Zhou, Jihong Guan","doi":"10.1109/TPAMI.2024.3435503","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3435503","url":null,"abstract":"<p><p>Causal partitioning is an effective approach for causal discovery based on the divide-and-conquer strategy. Up to now, various heuristic methods based on conditional independence (CI) tests have been proposed for causal partitioning. However, most of these methods fail to achieve satisfactory partitioning without violating d-separation, leading to poor inference performance. In this work, we transform causal partitioning into an alternative problem that can be more easily solved. Concretely, we first construct a superstructure G of the true causal graph G<sub>T</sub> by performing a set of low-order CI tests on the observed data D. Then, we leverage point-line duality to obtain a graph G<sub>A</sub> adjoint to G. We show that the solution of minimizing edge-cut ratio on G<sub>A</sub> can lead to a valid causal partitioning with smaller causal-cut ratio on G and without violating d-separation. We design an efficient algorithm to solve this problem. Extensive experiments show that the proposed method can achieve significantly better causal partitioning without violating d-separation than the existing methods. The source code and data are available at https://github.com/hzsiat/CPA.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-end Autonomous Driving: Challenges and Frontiers. 端到端自动驾驶:挑战与前沿。
Pub Date : 2024-07-30 DOI: 10.1109/TPAMI.2024.3435937
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.We maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.

在自动驾驶领域,采用端到端算法框架的方法迅速发展,这些方法利用原始传感器输入生成车辆运动计划,而不是专注于检测和运动预测等单项任务。与模块化流水线相比,端到端系统得益于感知和规划的联合特征优化。由于大规模数据集的可用性、闭环评估以及对自动驾驶算法在具有挑战性的场景中有效运行的需求日益增长,这一领域已蓬勃发展。在本调查报告中,我们对 270 多篇论文进行了全面分析,内容涵盖端到端自动驾驶的动机、路线图、方法、挑战和未来趋势。我们深入探讨了几个关键挑战,包括多模态、可解释性、因果混淆、鲁棒性和世界模型等。此外,我们还讨论了基础模型和视觉预训练方面的最新进展,以及如何将这些技术纳入端到端自动驾驶框架。我们维护着一个活跃的资料库,其中包含最新文献和开源项目,网址为 https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving。
{"title":"End-to-end Autonomous Driving: Challenges and Frontiers.","authors":"Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li","doi":"10.1109/TPAMI.2024.3435937","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3435937","url":null,"abstract":"<p><p>The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.We maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks. 针对模式复杂的异构信息网络的归纳元路径学习。
Pub Date : 2024-07-29 DOI: 10.1109/TPAMI.2024.3435055
Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.

异构信息网络(HIN)是具有多种类型节点和边的信息网络。元路径的概念,即连接两个实体的实体类型和关系类型序列,被提出来为各种 HIN 任务提供元级可解释语义。传统上,元路径主要用于模式简单的 HIN,如只有少数实体类型的书目网络,在这种网络中,元路径通常是用领域知识枚举出来的。然而,由于元路径枚举的计算复杂性,元路径在模式复杂的 HIN(如具有数百种实体和关系类型的知识库 (KB))中的应用受到了限制。此外,有效评估元路径需要枚举相关路径实例,这进一步增加了元路径学习过程的复杂性。为了应对这些挑战,我们提出了 SchemaWalk,这是一个针对模式复杂的 HIN 的归纳式元路径学习框架。我们用模式级表征来表示元路径,以支持对不同关系的元路径得分的学习,从而减少了对每种关系进行详尽路径实例枚举的需要。此外,我们还设计了一个基于强化学习的寻路代理,它可以直接浏览网络模式(即模式图),学习为多种关系建立高覆盖率和高置信度元路径的策略。在真实数据集上进行的大量实验证明了我们提出的模式的有效性。
{"title":"Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks.","authors":"Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu","doi":"10.1109/TPAMI.2024.3435055","DOIUrl":"10.1109/TPAMI.2024.3435055","url":null,"abstract":"<p><p>Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QKSAN: A Quantum Kernel Self-Attention Network. QKSAN:量子内核自关注网络。
Pub Date : 2024-07-29 DOI: 10.1109/TPAMI.2024.3434974
Ren-Xin Zhao, Jinjing Shi, Xuelong Li

The Self-Attention Mechanism (SAM) excels at distilling important information from the interior of data to improve the computational efficiency of models. Nevertheless, many Quantum Machine Learning (QML) models lack the ability to distinguish the intrinsic connections of information like SAM, which limits their effectiveness on massive high-dimensional quantum data. To tackle the above issue, a Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. Further, a Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques to release half of quantum resources by mid-circuit measurement, thereby bolstering both feasibility and adaptability. Simultaneously, the Quantum Kernel Self-Attention Score (QKSAS) with an exponentially large characterization space is spawned to accommodate more information and determine the measurement conditions. Eventually, four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST, where the QKSAS tests and correlation assessments between noise immunity and learning ability are executed on the best-performing sub-model. The paramount experimental finding is that the QKSAN subclasses possess the potential learning advantage of acquiring impressive accuracies exceeding 98.05% with far fewer parameters than classical machine learning models. Predictably, QKSAN lays the foundation for future quantum computers to perform machine learning on massive amounts of data while driving advances in areas such as quantum computer vision.

自注意机制(SAM)擅长从数据内部提炼重要信息,从而提高模型的计算效率。然而,许多量子机器学习(QML)模型缺乏像 SAM 那样分辨信息内在联系的能力,这限制了它们在海量高维量子数据上的有效性。为解决上述问题,我们引入了量子内核自关注机制(QKSAM),将量子内核方法(QKM)的数据表示优势与 SAM 的高效信息提取能力结合起来。此外,还在 QKSAM 的基础上提出了量子核自保持网络(QKSAN)框架,该框架巧妙地结合了延迟测量原理(DMP)和条件测量技术,通过中途测量释放一半量子资源,从而提高了可行性和适应性。与此同时,量子内核自注意分数(QKSAS)的表征空间呈指数级增长,可容纳更多信息并确定测量条件。最终,在 PennyLane 和 IBM Qiskit 平台上部署了四个 QKSAN 子模型,对 MNIST 和时尚 MNIST 进行二元分类,并在表现最佳的子模型上执行 QKSAS 测试以及抗噪性和学习能力之间的相关性评估。最重要的实验发现是,QKSAN 子类具有潜在的学习优势,与经典机器学习模型相比,它能以更少的参数获得超过 98.05% 的惊人准确率。可以预见,QKSAN 为未来量子计算机在海量数据上执行机器学习奠定了基础,同时推动了量子计算机视觉等领域的进步。
{"title":"QKSAN: A Quantum Kernel Self-Attention Network.","authors":"Ren-Xin Zhao, Jinjing Shi, Xuelong Li","doi":"10.1109/TPAMI.2024.3434974","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3434974","url":null,"abstract":"<p><p>The Self-Attention Mechanism (SAM) excels at distilling important information from the interior of data to improve the computational efficiency of models. Nevertheless, many Quantum Machine Learning (QML) models lack the ability to distinguish the intrinsic connections of information like SAM, which limits their effectiveness on massive high-dimensional quantum data. To tackle the above issue, a Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. Further, a Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques to release half of quantum resources by mid-circuit measurement, thereby bolstering both feasibility and adaptability. Simultaneously, the Quantum Kernel Self-Attention Score (QKSAS) with an exponentially large characterization space is spawned to accommodate more information and determine the measurement conditions. Eventually, four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST, where the QKSAS tests and correlation assessments between noise immunity and learning ability are executed on the best-performing sub-model. The paramount experimental finding is that the QKSAN subclasses possess the potential learning advantage of acquiring impressive accuracies exceeding 98.05% with far fewer parameters than classical machine learning models. Predictably, QKSAN lays the foundation for future quantum computers to perform machine learning on massive amounts of data while driving advances in areas such as quantum computer vision.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-Based Visual Segmentation: A Survey. 基于变压器的视觉分割:一项调查
Pub Date : 2024-07-29 DOI: 10.1109/TPAMI.2024.3434373
Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer.

视觉分割旨在将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中应用广泛,如自动驾驶、图像编辑、机器人传感和医学分析等。过去十年来,基于深度学习的方法在这一领域取得了显著进展。最近,变换器(一种基于自我注意的神经网络,最初设计用于自然语言处理)在各种视觉处理任务中大大超过了以前的卷积或递归方法。具体来说,视觉变换器为各种分割任务提供了稳健、统一甚至更简单的解决方案。本调查全面概述了基于变换器的视觉分割,并总结了最新进展。我们首先回顾了背景情况,包括问题定义、数据集和先前的卷积方法。接下来,我们总结了一种元架构,它统一了所有最新的基于变换器的方法。在此元架构的基础上,我们研究了各种方法设计,包括对元架构的修改和相关应用。我们还介绍了几个特定的子领域,包括三维点云分割、基础模型调整、领域感知分割、高效分割和医疗分割。此外,我们还在几个成熟的数据集上汇编并重新评估了已评审过的方法。最后,我们确定了这一领域的挑战,并提出了未来的研究方向。项目网页:https://github.com/lxtGH/Awesome-Segmentation-With-Transformer。
{"title":"Transformer-Based Visual Segmentation: A Survey.","authors":"Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy","doi":"10.1109/TPAMI.2024.3434373","DOIUrl":"10.1109/TPAMI.2024.3434373","url":null,"abstract":"<p><p>Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure and Intensity Unbiased Translation for 2D Medical Image Segmentation. 用于二维医学图像分割的结构和强度无偏翻译。
Pub Date : 2024-07-29 DOI: 10.1109/TPAMI.2024.3434435
Tianyang Zhang, Shaoming Zheng, Jun Cheng, Xi Jia, Joseph Bartlett, Xinxing Cheng, Zhaowen Qiu, Huazhu Fu, Jiang Liu, Ales Leonardis, Jinming Duan

Data distribution gaps often pose significant challenges to the use of deep segmentation models. However, retraining models for each distribution is expensive and time-consuming. In clinical contexts, device-embedded algorithms and networks, typically unretrainable and unaccessable post-manufacture, exacerbate this issue. Generative translation methods offer a solution to mitigate the gap by transferring data across domains. However, existing methods mainly focus on intensity distributions while ignoring the gaps due to structure disparities. In this paper, we formulate a new image-to-image translation task to reduce structural gaps. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets for segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structural gaps between the two images. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple pairs of datasets and is superior to prior arts in closing the gaps for improving segmentation.

数据分布差距通常会对深度分割模型的使用构成重大挑战。然而,针对每个分布重新训练模型既昂贵又耗时。在临床环境中,设备嵌入式算法和网络通常无法重新训练,制造后也无法访问,这加剧了这一问题。生成式翻译方法通过跨域传输数据,为缩小差距提供了解决方案。然而,现有方法主要关注强度分布,而忽略了结构差异造成的差距。在本文中,我们提出了一种新的图像到图像翻译任务,以减少结构差距。我们提出了一种简单但功能强大的结构无偏对抗(SUA)网络,该网络在分割时会考虑训练集和测试集之间的强度和结构差异。它由空间转换模块和强度分布渲染模块组成。空间转换模块旨在减少两幅图像之间的结构差距。然后,强度分布渲染模块将变形结构渲染为具有目标强度分布的图像。实验结果表明,所提出的 SUA 方法能够在多对数据集之间转移强度分布和结构内容,并且在缩小差距以改进分割方面优于现有技术。
{"title":"Structure and Intensity Unbiased Translation for 2D Medical Image Segmentation.","authors":"Tianyang Zhang, Shaoming Zheng, Jun Cheng, Xi Jia, Joseph Bartlett, Xinxing Cheng, Zhaowen Qiu, Huazhu Fu, Jiang Liu, Ales Leonardis, Jinming Duan","doi":"10.1109/TPAMI.2024.3434435","DOIUrl":"10.1109/TPAMI.2024.3434435","url":null,"abstract":"<p><p>Data distribution gaps often pose significant challenges to the use of deep segmentation models. However, retraining models for each distribution is expensive and time-consuming. In clinical contexts, device-embedded algorithms and networks, typically unretrainable and unaccessable post-manufacture, exacerbate this issue. Generative translation methods offer a solution to mitigate the gap by transferring data across domains. However, existing methods mainly focus on intensity distributions while ignoring the gaps due to structure disparities. In this paper, we formulate a new image-to-image translation task to reduce structural gaps. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets for segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structural gaps between the two images. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple pairs of datasets and is superior to prior arts in closing the gaps for improving segmentation.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Neural Message Passing for Inductive Learning on Hypergraphs. 超图上归纳学习的自适应神经信息传递
Pub Date : 2024-07-26 DOI: 10.1109/TPAMI.2024.3434483
Devanshu Arya, Deepak K Gupta, Stevan Rudinac, Marcel Worring

Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.

图是表示关系数据集和在其中进行推理的最普遍的数据结构。然而,它们只能模拟节点之间的成对关系,而不是为编码高阶关系而设计的。超图可以缓解这一缺陷,超图中的一条边可以连接任意数量的节点。大多数超图学习方法都是将超图结构转换为图结构,然后采用现有的几何深度学习方法。这种转换会导致信息丢失,并使超图的表现力得不到最佳利用。我们提出的 HyperMSG 是一种新颖的超图学习框架,它使用模块化的两级神经信息传递策略,在每个超边内和超边之间准确高效地传播信息。HyperMSG 通过学习与每个节点的度中心相关的注意力权重来适应数据和任务。这种机制可以量化节点的局部和全局重要性,从而捕捉到超图的结构特性。HyperMSG 具有归纳性,允许对以前未见过的节点进行推理。此外,它还具有很强的鲁棒性,在各种任务和数据集上都优于最先进的超图学习方法。最后,我们通过在一个具有挑战性的多媒体数据集上进行详细实验,证明了 HyperMSG 在学习多模态关系方面的有效性。
{"title":"Adaptive Neural Message Passing for Inductive Learning on Hypergraphs.","authors":"Devanshu Arya, Deepak K Gupta, Stevan Rudinac, Marcel Worring","doi":"10.1109/TPAMI.2024.3434483","DOIUrl":"10.1109/TPAMI.2024.3434483","url":null,"abstract":"<p><p>Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ego4D: Around the World in 3,000 Hours of Egocentric Video. Ego4D:以自我为中心的 3,000 小时视频环游世界。
Pub Date : 2024-07-26 DOI: 10.1109/TPAMI.2024.3381075
Kristen Grauman, Andrew Westbury, Eugene Byrne, Vincent Cartillier, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Devansh Kukreja, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C V Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/.

我们介绍的 Ego4D 是一个大规模自我中心视频数据集和基准套件。它提供了 3,670 个小时的日常生活活动视频,涵盖数百种场景(家庭、户外、工作场所、休闲等),由来自全球 74 个地点和 9 个不同国家的 931 位独特的摄像头佩戴者拍摄。收集方法的设计秉承了严格的隐私和道德标准,并在相关情况下征得了参与者的同意和严格的去标识化程序。Ego4D 大大增加了研究界可公开获得的以自我为中心的各种视频片段的数量。部分视频还配有音频、环境三维网格、眼球凝视、立体声和/或同一事件中多个自我中心摄像机的同步视频。此外,我们还提出了一系列新的基准挑战,其核心是理解第一人称视觉体验的过去(查询情节记忆)、现在(分析手部物体操作、视听对话和社交互动)和未来(预测活动)。通过公开分享这一大规模注释数据集和基准套件,我们旨在推动第一人称感知的前沿发展。项目页面:https://ego4d-data.org/。
{"title":"Ego4D: Around the World in 3,000 Hours of Egocentric Video.","authors":"Kristen Grauman, Andrew Westbury, Eugene Byrne, Vincent Cartillier, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Devansh Kukreja, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C V Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik","doi":"10.1109/TPAMI.2024.3381075","DOIUrl":"10.1109/TPAMI.2024.3381075","url":null,"abstract":"<p><p>We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matryoshka: Exploiting the Over-Parametrization of Deep Learning Models for Covert Data Transmission. Matryoshka:利用深度学习模型的过度参数化进行隐蔽数据传输。
Pub Date : 2024-07-26 DOI: 10.1109/TPAMI.2024.3434417
Xudong Pan, Mi Zhang, Yifan Yan, Shengyao Zhang, Min Yang

High-quality private machine learning (ML) data stored in local data centers becomes a key competitive factor for AI corporations. In this paper, we present a novel insider attack called Matryoshka to reveal the possibility of breaking the privacy of ML data even with no exposed interface. Our attack employs a scheduled-to-publish DNN model as a carrier model for covert transmission of secret models which memorize the information of private ML data that otherwise has no interface to the outsider. At the core of our attack, we present a novel parameter sharing approach which exploits the learning capacity of the carrier model for information hiding. Our approach simultaneously achieves: (i) High Capacity - With almost no utility loss of the carrier model, Matryoshka can transmit over 10,000 real-world data samples within a carrier model which has 220× less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets or models within a single carrier model under a trivial distortion rate, neither of which can be done with existing steganography techniques; (ii) Decoding Efficiency - once downloading the published carrier model, an outside colluder can exclusively decode the hidden models from the carrier model with only several integer secrets and the knowledge of the hidden model architecture; (iii) Effectiveness - Moreover, almost all the recovered models either have similar performance as if it is trained independently on the private data, or can be further used to extract memorized raw training data with low error; (iv) Robustness - Information redundancy is naturally implemented to achieve resilience against common post-processing techniques on the carrier before its publishing; (v) Covertness - A model inspector with different levels of prior knowledge could hardly differentiate a carrier model from a normal model.

存储在本地数据中心的高质量私有机器学习(ML)数据已成为人工智能企业的关键竞争因素。在本文中,我们提出了一种名为 "Matryoshka "的新型内部攻击,揭示了在没有暴露接口的情况下破坏 ML 数据隐私的可能性。我们的攻击采用预定发布的 DNN 模型作为秘密模型隐蔽传输的载体模型,这些秘密模型会记住私密 ML 数据的信息,而这些数据对外部没有接口。作为攻击的核心,我们提出了一种新颖的参数共享方法,利用载体模型的学习能力进行信息隐藏。我们的方法同时实现了(i) 高容量--在几乎不损失载体模型效用的情况下,Matryoshka 可以在一个载体模型中传输超过 10,000 个真实世界的数据样本,而这个载体模型的参数比被窃取数据的总大小少 220 倍,并且可以在一个载体模型中同时传输多个异构数据集或模型,而失真率微乎其微,现有的隐写技术都无法做到这一点;(ii) 解码效率--一旦下载了已发布的载体模型,外部串通者只需掌握几个整数机密和隐藏模型的架构知识,就能完全从载体模型中解码出隐藏模型;(iii)有效性--此外,几乎所有恢复的模型都具有与在私有数据上独立训练的模型相似的性能,或者可以进一步用于提取已记忆的原始训练数据,且误差很小;(iv)鲁棒性--信息冗余是自然实现的,可以在载体发布前抵御常见的后处理技术;(v)隐蔽性--具有不同先验知识水平的模型检查员很难区分载体模型和普通模型。
{"title":"Matryoshka: Exploiting the Over-Parametrization of Deep Learning Models for Covert Data Transmission.","authors":"Xudong Pan, Mi Zhang, Yifan Yan, Shengyao Zhang, Min Yang","doi":"10.1109/TPAMI.2024.3434417","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3434417","url":null,"abstract":"<p><p>High-quality private machine learning (ML) data stored in local data centers becomes a key competitive factor for AI corporations. In this paper, we present a novel insider attack called Matryoshka to reveal the possibility of breaking the privacy of ML data even with no exposed interface. Our attack employs a scheduled-to-publish DNN model as a carrier model for covert transmission of secret models which memorize the information of private ML data that otherwise has no interface to the outsider. At the core of our attack, we present a novel parameter sharing approach which exploits the learning capacity of the carrier model for information hiding. Our approach simultaneously achieves: (i) High Capacity - With almost no utility loss of the carrier model, Matryoshka can transmit over 10,000 real-world data samples within a carrier model which has 220× less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets or models within a single carrier model under a trivial distortion rate, neither of which can be done with existing steganography techniques; (ii) Decoding Efficiency - once downloading the published carrier model, an outside colluder can exclusively decode the hidden models from the carrier model with only several integer secrets and the knowledge of the hidden model architecture; (iii) Effectiveness - Moreover, almost all the recovered models either have similar performance as if it is trained independently on the private data, or can be further used to extract memorized raw training data with low error; (iv) Robustness - Information redundancy is naturally implemented to achieve resilience against common post-processing techniques on the carrier before its publishing; (v) Covertness - A model inspector with different levels of prior knowledge could hardly differentiate a carrier model from a normal model.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid All-in-focus Imaging from Neuromorphic Focal Stack. 来自神经形态焦点堆栈的混合全焦成像。
Pub Date : 2024-07-25 DOI: 10.1109/TPAMI.2024.3433607
Minggui Teng, Hanyue Lou, Yixin Yang, Tiejun Huang, Boxin Shi

Creating an image focal stack requires multiple shots, which captures images at different depths within the same scene. Such methods are not suitable for scenes undergoing continuous changes. Achieving an all-in-focus image from a single shot poses significant challenges, due to the highly ill-posed nature of rectifying defocus and deblurring from a single image. In this paper, to restore an all-in-focus image, we introduce the neuromorphic focal stack, which is defined as neuromorphic signal streams captured by an event/ a spike camera during a continuous focal sweep, aiming to restore an all-in-focus image. Given an RGB image focused at any distance, we harness the high temporal resolution of neuromorphic signal streams. From neuromorphic signal streams, we automatically select refocusing timestamps and reconstruct corresponding refocused images to form a focal stack. Guided by the neuromorphic signal around the selected timestamps, we can merge the focal stack using proper weights and restore a sharp all-in-focus image. We test our method on two distinct neuromorphic cameras. Experimental results from both synthetic and real datasets demonstrate a marked improvement over existing state-of-the-art methods.

创建图像焦点堆栈需要多次拍摄,在同一场景中捕捉不同深度的图像。这种方法不适合连续变化的场景。由于从单个图像纠正散焦和去模糊具有高度不确定性,因此从单个镜头获得全焦图像是一项重大挑战。在本文中,为了还原全焦图像,我们引入了神经形态焦点堆栈,它被定义为事件/尖峰摄像机在连续焦点扫描过程中捕获的神经形态信号流,旨在还原全焦图像。对于任意距离聚焦的 RGB 图像,我们利用神经形态信号流的高时间分辨率。从神经形态信号流中,我们自动选择重新聚焦的时间戳,并重建相应的重新聚焦图像,形成焦点堆栈。在所选时间戳周围神经形态信号的引导下,我们可以使用适当的权重合并焦点堆栈,还原清晰的全焦图像。我们在两台不同的神经形态相机上测试了我们的方法。合成数据集和真实数据集的实验结果表明,我们的方法明显优于现有的最先进方法。
{"title":"Hybrid All-in-focus Imaging from Neuromorphic Focal Stack.","authors":"Minggui Teng, Hanyue Lou, Yixin Yang, Tiejun Huang, Boxin Shi","doi":"10.1109/TPAMI.2024.3433607","DOIUrl":"10.1109/TPAMI.2024.3433607","url":null,"abstract":"<p><p>Creating an image focal stack requires multiple shots, which captures images at different depths within the same scene. Such methods are not suitable for scenes undergoing continuous changes. Achieving an all-in-focus image from a single shot poses significant challenges, due to the highly ill-posed nature of rectifying defocus and deblurring from a single image. In this paper, to restore an all-in-focus image, we introduce the neuromorphic focal stack, which is defined as neuromorphic signal streams captured by an event/ a spike camera during a continuous focal sweep, aiming to restore an all-in-focus image. Given an RGB image focused at any distance, we harness the high temporal resolution of neuromorphic signal streams. From neuromorphic signal streams, we automatically select refocusing timestamps and reconstruct corresponding refocused images to form a focal stack. Guided by the neuromorphic signal around the selected timestamps, we can merge the focal stack using proper weights and restore a sharp all-in-focus image. We test our method on two distinct neuromorphic cameras. Experimental results from both synthetic and real datasets demonstrate a marked improvement over existing state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on pattern analysis and machine intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1