Causal partitioning is an effective approach for causal discovery based on the divide-and-conquer strategy. Up to now, various heuristic methods based on conditional independence (CI) tests have been proposed for causal partitioning. However, most of these methods fail to achieve satisfactory partitioning without violating d-separation, leading to poor inference performance. In this work, we transform causal partitioning into an alternative problem that can be more easily solved. Concretely, we first construct a superstructure G of the true causal graph GT by performing a set of low-order CI tests on the observed data D. Then, we leverage point-line duality to obtain a graph GA adjoint to G. We show that the solution of minimizing edge-cut ratio on GA can lead to a valid causal partitioning with smaller causal-cut ratio on G and without violating d-separation. We design an efficient algorithm to solve this problem. Extensive experiments show that the proposed method can achieve significantly better causal partitioning without violating d-separation than the existing methods. The source code and data are available at https://github.com/hzsiat/CPA.
因果分割是一种基于分而治之策略的有效因果发现方法。迄今为止,已有多种基于条件独立性(CI)检验的启发式方法被提出用于因果分割。然而,这些方法大多无法在不违反 d 分离的情况下实现令人满意的划分,从而导致推理效果不佳。在这项工作中,我们将因果分割转化为另一个更容易解决的问题。具体来说,我们首先通过对观测数据 D 进行一系列低阶 CI 检验,构建出真实因果图 GT 的上层结构 G。然后,我们利用点线对偶性得到与 G 相邻的图 GA。我们设计了一种高效算法来解决这个问题。大量实验表明,与现有方法相比,所提出的方法能在不违反 d 分离的情况下实现更好的因果分割。源代码和数据可在 https://github.com/hzsiat/CPA 上获取。
{"title":"Towards Effective Causal Partitioning by Edge Cutting of Adjoint Graph.","authors":"Hao Zhang, Yixin Ren, Yewei Xia, Shuigeng Zhou, Jihong Guan","doi":"10.1109/TPAMI.2024.3435503","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3435503","url":null,"abstract":"<p><p>Causal partitioning is an effective approach for causal discovery based on the divide-and-conquer strategy. Up to now, various heuristic methods based on conditional independence (CI) tests have been proposed for causal partitioning. However, most of these methods fail to achieve satisfactory partitioning without violating d-separation, leading to poor inference performance. In this work, we transform causal partitioning into an alternative problem that can be more easily solved. Concretely, we first construct a superstructure G of the true causal graph G<sub>T</sub> by performing a set of low-order CI tests on the observed data D. Then, we leverage point-line duality to obtain a graph G<sub>A</sub> adjoint to G. We show that the solution of minimizing edge-cut ratio on G<sub>A</sub> can lead to a valid causal partitioning with smaller causal-cut ratio on G and without violating d-separation. We design an efficient algorithm to solve this problem. Extensive experiments show that the proposed method can achieve significantly better causal partitioning without violating d-separation than the existing methods. The source code and data are available at https://github.com/hzsiat/CPA.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1109/TPAMI.2024.3435937
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li
The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.We maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.
{"title":"End-to-end Autonomous Driving: Challenges and Frontiers.","authors":"Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li","doi":"10.1109/TPAMI.2024.3435937","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3435937","url":null,"abstract":"<p><p>The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.We maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.
异构信息网络(HIN)是具有多种类型节点和边的信息网络。元路径的概念,即连接两个实体的实体类型和关系类型序列,被提出来为各种 HIN 任务提供元级可解释语义。传统上,元路径主要用于模式简单的 HIN,如只有少数实体类型的书目网络,在这种网络中,元路径通常是用领域知识枚举出来的。然而,由于元路径枚举的计算复杂性,元路径在模式复杂的 HIN(如具有数百种实体和关系类型的知识库 (KB))中的应用受到了限制。此外,有效评估元路径需要枚举相关路径实例,这进一步增加了元路径学习过程的复杂性。为了应对这些挑战,我们提出了 SchemaWalk,这是一个针对模式复杂的 HIN 的归纳式元路径学习框架。我们用模式级表征来表示元路径,以支持对不同关系的元路径得分的学习,从而减少了对每种关系进行详尽路径实例枚举的需要。此外,我们还设计了一个基于强化学习的寻路代理,它可以直接浏览网络模式(即模式图),学习为多种关系建立高覆盖率和高置信度元路径的策略。在真实数据集上进行的大量实验证明了我们提出的模式的有效性。
{"title":"Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks.","authors":"Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu","doi":"10.1109/TPAMI.2024.3435055","DOIUrl":"10.1109/TPAMI.2024.3435055","url":null,"abstract":"<p><p>Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1109/TPAMI.2024.3434974
Ren-Xin Zhao, Jinjing Shi, Xuelong Li
The Self-Attention Mechanism (SAM) excels at distilling important information from the interior of data to improve the computational efficiency of models. Nevertheless, many Quantum Machine Learning (QML) models lack the ability to distinguish the intrinsic connections of information like SAM, which limits their effectiveness on massive high-dimensional quantum data. To tackle the above issue, a Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. Further, a Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques to release half of quantum resources by mid-circuit measurement, thereby bolstering both feasibility and adaptability. Simultaneously, the Quantum Kernel Self-Attention Score (QKSAS) with an exponentially large characterization space is spawned to accommodate more information and determine the measurement conditions. Eventually, four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST, where the QKSAS tests and correlation assessments between noise immunity and learning ability are executed on the best-performing sub-model. The paramount experimental finding is that the QKSAN subclasses possess the potential learning advantage of acquiring impressive accuracies exceeding 98.05% with far fewer parameters than classical machine learning models. Predictably, QKSAN lays the foundation for future quantum computers to perform machine learning on massive amounts of data while driving advances in areas such as quantum computer vision.
自注意机制(SAM)擅长从数据内部提炼重要信息,从而提高模型的计算效率。然而,许多量子机器学习(QML)模型缺乏像 SAM 那样分辨信息内在联系的能力,这限制了它们在海量高维量子数据上的有效性。为解决上述问题,我们引入了量子内核自关注机制(QKSAM),将量子内核方法(QKM)的数据表示优势与 SAM 的高效信息提取能力结合起来。此外,还在 QKSAM 的基础上提出了量子核自保持网络(QKSAN)框架,该框架巧妙地结合了延迟测量原理(DMP)和条件测量技术,通过中途测量释放一半量子资源,从而提高了可行性和适应性。与此同时,量子内核自注意分数(QKSAS)的表征空间呈指数级增长,可容纳更多信息并确定测量条件。最终,在 PennyLane 和 IBM Qiskit 平台上部署了四个 QKSAN 子模型,对 MNIST 和时尚 MNIST 进行二元分类,并在表现最佳的子模型上执行 QKSAS 测试以及抗噪性和学习能力之间的相关性评估。最重要的实验发现是,QKSAN 子类具有潜在的学习优势,与经典机器学习模型相比,它能以更少的参数获得超过 98.05% 的惊人准确率。可以预见,QKSAN 为未来量子计算机在海量数据上执行机器学习奠定了基础,同时推动了量子计算机视觉等领域的进步。
{"title":"QKSAN: A Quantum Kernel Self-Attention Network.","authors":"Ren-Xin Zhao, Jinjing Shi, Xuelong Li","doi":"10.1109/TPAMI.2024.3434974","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3434974","url":null,"abstract":"<p><p>The Self-Attention Mechanism (SAM) excels at distilling important information from the interior of data to improve the computational efficiency of models. Nevertheless, many Quantum Machine Learning (QML) models lack the ability to distinguish the intrinsic connections of information like SAM, which limits their effectiveness on massive high-dimensional quantum data. To tackle the above issue, a Quantum Kernel Self-Attention Mechanism (QKSAM) is introduced to combine the data representation merit of Quantum Kernel Methods (QKM) with the efficient information extraction capability of SAM. Further, a Quantum Kernel Self-Attention Network (QKSAN) framework is proposed based on QKSAM, which ingeniously incorporates the Deferred Measurement Principle (DMP) and conditional measurement techniques to release half of quantum resources by mid-circuit measurement, thereby bolstering both feasibility and adaptability. Simultaneously, the Quantum Kernel Self-Attention Score (QKSAS) with an exponentially large characterization space is spawned to accommodate more information and determine the measurement conditions. Eventually, four QKSAN sub-models are deployed on PennyLane and IBM Qiskit platforms to perform binary classification on MNIST and Fashion MNIST, where the QKSAS tests and correlation assessments between noise immunity and learning ability are executed on the best-performing sub-model. The paramount experimental finding is that the QKSAN subclasses possess the potential learning advantage of acquiring impressive accuracies exceeding 98.05% with far fewer parameters than classical machine learning models. Predictably, QKSAN lays the foundation for future quantum computers to perform machine learning on massive amounts of data while driving advances in areas such as quantum computer vision.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer.
{"title":"Transformer-Based Visual Segmentation: A Survey.","authors":"Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy","doi":"10.1109/TPAMI.2024.3434373","DOIUrl":"10.1109/TPAMI.2024.3434373","url":null,"abstract":"<p><p>Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1109/TPAMI.2024.3434435
Tianyang Zhang, Shaoming Zheng, Jun Cheng, Xi Jia, Joseph Bartlett, Xinxing Cheng, Zhaowen Qiu, Huazhu Fu, Jiang Liu, Ales Leonardis, Jinming Duan
Data distribution gaps often pose significant challenges to the use of deep segmentation models. However, retraining models for each distribution is expensive and time-consuming. In clinical contexts, device-embedded algorithms and networks, typically unretrainable and unaccessable post-manufacture, exacerbate this issue. Generative translation methods offer a solution to mitigate the gap by transferring data across domains. However, existing methods mainly focus on intensity distributions while ignoring the gaps due to structure disparities. In this paper, we formulate a new image-to-image translation task to reduce structural gaps. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets for segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structural gaps between the two images. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple pairs of datasets and is superior to prior arts in closing the gaps for improving segmentation.
数据分布差距通常会对深度分割模型的使用构成重大挑战。然而,针对每个分布重新训练模型既昂贵又耗时。在临床环境中,设备嵌入式算法和网络通常无法重新训练,制造后也无法访问,这加剧了这一问题。生成式翻译方法通过跨域传输数据,为缩小差距提供了解决方案。然而,现有方法主要关注强度分布,而忽略了结构差异造成的差距。在本文中,我们提出了一种新的图像到图像翻译任务,以减少结构差距。我们提出了一种简单但功能强大的结构无偏对抗(SUA)网络,该网络在分割时会考虑训练集和测试集之间的强度和结构差异。它由空间转换模块和强度分布渲染模块组成。空间转换模块旨在减少两幅图像之间的结构差距。然后,强度分布渲染模块将变形结构渲染为具有目标强度分布的图像。实验结果表明,所提出的 SUA 方法能够在多对数据集之间转移强度分布和结构内容,并且在缩小差距以改进分割方面优于现有技术。
{"title":"Structure and Intensity Unbiased Translation for 2D Medical Image Segmentation.","authors":"Tianyang Zhang, Shaoming Zheng, Jun Cheng, Xi Jia, Joseph Bartlett, Xinxing Cheng, Zhaowen Qiu, Huazhu Fu, Jiang Liu, Ales Leonardis, Jinming Duan","doi":"10.1109/TPAMI.2024.3434435","DOIUrl":"10.1109/TPAMI.2024.3434435","url":null,"abstract":"<p><p>Data distribution gaps often pose significant challenges to the use of deep segmentation models. However, retraining models for each distribution is expensive and time-consuming. In clinical contexts, device-embedded algorithms and networks, typically unretrainable and unaccessable post-manufacture, exacerbate this issue. Generative translation methods offer a solution to mitigate the gap by transferring data across domains. However, existing methods mainly focus on intensity distributions while ignoring the gaps due to structure disparities. In this paper, we formulate a new image-to-image translation task to reduce structural gaps. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets for segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structural gaps between the two images. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple pairs of datasets and is superior to prior arts in closing the gaps for improving segmentation.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1109/TPAMI.2024.3434483
Devanshu Arya, Deepak K Gupta, Stevan Rudinac, Marcel Worring
Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.
{"title":"Adaptive Neural Message Passing for Inductive Learning on Hypergraphs.","authors":"Devanshu Arya, Deepak K Gupta, Stevan Rudinac, Marcel Worring","doi":"10.1109/TPAMI.2024.3434483","DOIUrl":"10.1109/TPAMI.2024.3434483","url":null,"abstract":"<p><p>Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1109/TPAMI.2024.3381075
Kristen Grauman, Andrew Westbury, Eugene Byrne, Vincent Cartillier, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Devansh Kukreja, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C V Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/.
{"title":"Ego4D: Around the World in 3,000 Hours of Egocentric Video.","authors":"Kristen Grauman, Andrew Westbury, Eugene Byrne, Vincent Cartillier, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Devansh Kukreja, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C V Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik","doi":"10.1109/TPAMI.2024.3381075","DOIUrl":"10.1109/TPAMI.2024.3381075","url":null,"abstract":"<p><p>We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1109/TPAMI.2024.3434417
Xudong Pan, Mi Zhang, Yifan Yan, Shengyao Zhang, Min Yang
High-quality private machine learning (ML) data stored in local data centers becomes a key competitive factor for AI corporations. In this paper, we present a novel insider attack called Matryoshka to reveal the possibility of breaking the privacy of ML data even with no exposed interface. Our attack employs a scheduled-to-publish DNN model as a carrier model for covert transmission of secret models which memorize the information of private ML data that otherwise has no interface to the outsider. At the core of our attack, we present a novel parameter sharing approach which exploits the learning capacity of the carrier model for information hiding. Our approach simultaneously achieves: (i) High Capacity - With almost no utility loss of the carrier model, Matryoshka can transmit over 10,000 real-world data samples within a carrier model which has 220× less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets or models within a single carrier model under a trivial distortion rate, neither of which can be done with existing steganography techniques; (ii) Decoding Efficiency - once downloading the published carrier model, an outside colluder can exclusively decode the hidden models from the carrier model with only several integer secrets and the knowledge of the hidden model architecture; (iii) Effectiveness - Moreover, almost all the recovered models either have similar performance as if it is trained independently on the private data, or can be further used to extract memorized raw training data with low error; (iv) Robustness - Information redundancy is naturally implemented to achieve resilience against common post-processing techniques on the carrier before its publishing; (v) Covertness - A model inspector with different levels of prior knowledge could hardly differentiate a carrier model from a normal model.
存储在本地数据中心的高质量私有机器学习(ML)数据已成为人工智能企业的关键竞争因素。在本文中,我们提出了一种名为 "Matryoshka "的新型内部攻击,揭示了在没有暴露接口的情况下破坏 ML 数据隐私的可能性。我们的攻击采用预定发布的 DNN 模型作为秘密模型隐蔽传输的载体模型,这些秘密模型会记住私密 ML 数据的信息,而这些数据对外部没有接口。作为攻击的核心,我们提出了一种新颖的参数共享方法,利用载体模型的学习能力进行信息隐藏。我们的方法同时实现了(i) 高容量--在几乎不损失载体模型效用的情况下,Matryoshka 可以在一个载体模型中传输超过 10,000 个真实世界的数据样本,而这个载体模型的参数比被窃取数据的总大小少 220 倍,并且可以在一个载体模型中同时传输多个异构数据集或模型,而失真率微乎其微,现有的隐写技术都无法做到这一点;(ii) 解码效率--一旦下载了已发布的载体模型,外部串通者只需掌握几个整数机密和隐藏模型的架构知识,就能完全从载体模型中解码出隐藏模型;(iii)有效性--此外,几乎所有恢复的模型都具有与在私有数据上独立训练的模型相似的性能,或者可以进一步用于提取已记忆的原始训练数据,且误差很小;(iv)鲁棒性--信息冗余是自然实现的,可以在载体发布前抵御常见的后处理技术;(v)隐蔽性--具有不同先验知识水平的模型检查员很难区分载体模型和普通模型。
{"title":"Matryoshka: Exploiting the Over-Parametrization of Deep Learning Models for Covert Data Transmission.","authors":"Xudong Pan, Mi Zhang, Yifan Yan, Shengyao Zhang, Min Yang","doi":"10.1109/TPAMI.2024.3434417","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3434417","url":null,"abstract":"<p><p>High-quality private machine learning (ML) data stored in local data centers becomes a key competitive factor for AI corporations. In this paper, we present a novel insider attack called Matryoshka to reveal the possibility of breaking the privacy of ML data even with no exposed interface. Our attack employs a scheduled-to-publish DNN model as a carrier model for covert transmission of secret models which memorize the information of private ML data that otherwise has no interface to the outsider. At the core of our attack, we present a novel parameter sharing approach which exploits the learning capacity of the carrier model for information hiding. Our approach simultaneously achieves: (i) High Capacity - With almost no utility loss of the carrier model, Matryoshka can transmit over 10,000 real-world data samples within a carrier model which has 220× less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets or models within a single carrier model under a trivial distortion rate, neither of which can be done with existing steganography techniques; (ii) Decoding Efficiency - once downloading the published carrier model, an outside colluder can exclusively decode the hidden models from the carrier model with only several integer secrets and the knowledge of the hidden model architecture; (iii) Effectiveness - Moreover, almost all the recovered models either have similar performance as if it is trained independently on the private data, or can be further used to extract memorized raw training data with low error; (iv) Robustness - Information redundancy is naturally implemented to achieve resilience against common post-processing techniques on the carrier before its publishing; (v) Covertness - A model inspector with different levels of prior knowledge could hardly differentiate a carrier model from a normal model.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1109/TPAMI.2024.3433607
Minggui Teng, Hanyue Lou, Yixin Yang, Tiejun Huang, Boxin Shi
Creating an image focal stack requires multiple shots, which captures images at different depths within the same scene. Such methods are not suitable for scenes undergoing continuous changes. Achieving an all-in-focus image from a single shot poses significant challenges, due to the highly ill-posed nature of rectifying defocus and deblurring from a single image. In this paper, to restore an all-in-focus image, we introduce the neuromorphic focal stack, which is defined as neuromorphic signal streams captured by an event/ a spike camera during a continuous focal sweep, aiming to restore an all-in-focus image. Given an RGB image focused at any distance, we harness the high temporal resolution of neuromorphic signal streams. From neuromorphic signal streams, we automatically select refocusing timestamps and reconstruct corresponding refocused images to form a focal stack. Guided by the neuromorphic signal around the selected timestamps, we can merge the focal stack using proper weights and restore a sharp all-in-focus image. We test our method on two distinct neuromorphic cameras. Experimental results from both synthetic and real datasets demonstrate a marked improvement over existing state-of-the-art methods.
{"title":"Hybrid All-in-focus Imaging from Neuromorphic Focal Stack.","authors":"Minggui Teng, Hanyue Lou, Yixin Yang, Tiejun Huang, Boxin Shi","doi":"10.1109/TPAMI.2024.3433607","DOIUrl":"10.1109/TPAMI.2024.3433607","url":null,"abstract":"<p><p>Creating an image focal stack requires multiple shots, which captures images at different depths within the same scene. Such methods are not suitable for scenes undergoing continuous changes. Achieving an all-in-focus image from a single shot poses significant challenges, due to the highly ill-posed nature of rectifying defocus and deblurring from a single image. In this paper, to restore an all-in-focus image, we introduce the neuromorphic focal stack, which is defined as neuromorphic signal streams captured by an event/ a spike camera during a continuous focal sweep, aiming to restore an all-in-focus image. Given an RGB image focused at any distance, we harness the high temporal resolution of neuromorphic signal streams. From neuromorphic signal streams, we automatically select refocusing timestamps and reconstruct corresponding refocused images to form a focal stack. Guided by the neuromorphic signal around the selected timestamps, we can merge the focal stack using proper weights and restore a sharp all-in-focus image. We test our method on two distinct neuromorphic cameras. Experimental results from both synthetic and real datasets demonstrate a marked improvement over existing state-of-the-art methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}