首页 > 最新文献

IEEE transactions on pattern analysis and machine intelligence最新文献

英文 中文
Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation. Metric3D v2:用于零镜头度量深度和表面法线估算的多功能单目几何基础模型。
Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444912
Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. State-of-the-art (SoTA) monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method currently ranks the 1st on various zero-shot and non-zero-shot benchmarks for metric depth, affine-invariant-depth as well as surface-normal prediction, shown in Fig. 1. Notably, we surpassed the ultra-recent MarigoldDepth and DepthAnything on various depth benchmarks including NYUv2 and KITTI. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. These applications highlight the versatility of Metric3D v2 models as geometric foundation models. Our project page is at https://JUGGHM.github.io/Metric3Dv2.

我们介绍了 Metric3D v2,这是一种几何基础模型,用于从单幅图像中进行零镜头度量深度和表面法线估算,这对于度量三维复原至关重要。虽然深度和法线在几何上相互关联、互为补充,但它们也面临着不同的挑战。最先进的(SoTA)单目深度方法通过学习仿射不变深度来实现零点泛化,但无法恢复真实世界的度量。同时,由于缺乏大规模标注数据,SoTA 正常估计方法的零镜头性能有限。为了解决这些问题,我们提出了度量深度估计和表面法线估计的解决方案。在度量深度估算方面,我们发现零镜头单视角模型的关键在于解决来自各种相机模型和大规模数据训练的度量模糊性。我们提出了一个典型相机空间转换模块,它明确地解决了模糊性问题,并能毫不费力地插入到现有的单目模型中。对于表面法线估计,我们提出了一个深度-法线联合优化模块,从度量深度中提炼出多样化的数据知识,使法线估计器能够学习法线标签以外的知识。有了这些模块,我们的深度-法线模型就能稳定地训练来自成千上万不同类型注释的相机模型的 1600 多万张图像,从而实现对未见相机设置的野外图像的零误差泛化。目前,我们的方法在公制深度、仿射不变深度以及表面法线预测的各种零拍摄和非零拍摄基准测试中排名第一,如图 1 所示。值得注意的是,在包括 NYUv2 和 KITTI 在内的各种深度基准测试中,我们超越了最新的 MarigoldDepth 和 DepthAnything。我们的方法能够在随机收集的互联网图像上准确恢复度量三维结构,为可信的单图像计量铺平了道路。我们的潜在优势还可延伸到下游任务,只需插入我们的模型,这些任务就能得到显著改善。例如,我们的模型解决了单目-SLAM 的尺度漂移问题(图 3),从而实现了高质量的度量尺度密集映射。这些应用凸显了 Metric3D v2 模型作为几何基础模型的多功能性。我们的项目页面是 https://JUGGHM.github.io/Metric3Dv2。
{"title":"Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation.","authors":"Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen","doi":"10.1109/TPAMI.2024.3444912","DOIUrl":"10.1109/TPAMI.2024.3444912","url":null,"abstract":"<p><p>We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. State-of-the-art (SoTA) monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method currently ranks the 1st on various zero-shot and non-zero-shot benchmarks for metric depth, affine-invariant-depth as well as surface-normal prediction, shown in Fig. 1. Notably, we surpassed the ultra-recent MarigoldDepth and DepthAnything on various depth benchmarks including NYUv2 and KITTI. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 3), leading to high-quality metric scale dense mapping. These applications highlight the versatility of Metric3D v2 models as geometric foundation models. Our project page is at https://JUGGHM.github.io/Metric3Dv2.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective. 从运动看深度非刚性结构:序列到序列的翻译视角。
Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3443922
Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.

从单个二维帧直接回归非刚性形状和摄像机姿态的方法不适合非刚性运动结构(NRSfM)问题。这种逐帧三维重建管道忽略了 NRSfM 固有的时空特性,即从输入的二维序列重建三维序列。在本文中,我们提出从序列到序列转换的角度来解决深度稀疏 NRSfM 问题,即把输入的二维关键点序列作为一个整体,以自我监督的方式重建相应的三维关键点序列。首先,我们在输入序列上应用形状-运动预测器,以获得初始形状序列和相应的运动。然后,我们提出了 "上下文层"(Context Layer),它使深度学习框架能够根据非刚性序列的结构特征,有效地对序列施加整体约束。上下文层以多头注意力(MHA)为核心,结合时间编码的使用,构建了对非刚性序列施加自表达正则性的模块,两者同时作用,构成了深度框架中对非刚性序列的约束。在Human3.6M、CMU Mocap和InterHand等不同数据集上的实验结果证明了我们框架的优越性。代码将公开发布。
{"title":"Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective.","authors":"Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li","doi":"10.1109/TPAMI.2024.3443922","DOIUrl":"10.1109/TPAMI.2024.3443922","url":null,"abstract":"<p><p>Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality improvement synthetic aperture radar (SAR) images using compressive sensing (CS) with Moore-Penrose inverse (MPI) and prior from spatial variant apodization (SVA). 利用压缩传感技术(CS)、摩尔-彭罗斯反演技术(MPI)和空间变异聚焦先验技术(SVA)提高合成孔径雷达(SAR)图像的质量。
Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444910
Tao Xiong, Yachao Li, Mengdao Xing

When the locations of non-zero samples are known, the Moore-Penrose inverse (MPI) can be used for the data recovery of compressive sensing (CS). First, the prior from the locations is used to shrink the measurement matrix in CS. Then the data can be recovered by using MPI with such shrinking matrix. We can also prove that the results of data recovery from the original CS and our MPI-based method are the same mathematically. Based on such finding, a novel sidelobe-reduction method for synthetic aperture radar (SAR) and Polarimetric SAR (POLSAR) images is studied. The aim of sidelobe reduction is to recover the samples within the mainlobes and suppress the ones within the sidelobes. In our study, prior from spatial variant apodization (SVA) is used to determine the locations of the mainlobes and the sidelobes, respectively. With CS, the mainlobe area can be well recovered. Samples within the sidelobe areas are also recovered using background fusion. Our method is suitable for acquired data with large sizes. The performance of the proposed algorithm is evaluated with acquired spaceborne SAR and air-borne POLSAR data. In our experiments, we use the 1m space-borne SAR data with the size of 10000 (samples) × 10000 (samples) and 0.3m POLSAR data with the size of 10000 (samples) × 26000 (samples) for sidelobe suppression. Furthermore, We also verified that, our method does not affect the polarization signatures. The effectiveness for the sidelobe suppression is qualitatively examined, and results were satisfactory.

当非零样本的位置已知时,摩尔-彭罗斯逆(MPI)可用于压缩传感(CS)的数据恢复。首先,利用位置先验来缩小 CS 中的测量矩阵。然后,利用 MPI 和缩小后的矩阵恢复数据。我们还可以证明,原始 CS 和基于 MPI 方法的数据恢复结果在数学上是相同的。基于上述发现,我们研究了一种用于合成孔径雷达(SAR)和极坐标合成孔径雷达(POLSAR)图像的新型减少侧叶方法。减少侧叶的目的是恢复主叶内的样本,抑制侧叶内的样本。在我们的研究中,先验空间变异日调(SVA)被用来分别确定主叶和边叶的位置。利用 CS 可以很好地恢复主叶区域。侧叶区域内的样本也能通过背景融合得到恢复。我们的方法适用于大尺寸的采集数据。我们利用获取的空间合成孔径雷达和机载 POLSAR 数据对所提算法的性能进行了评估。在实验中,我们使用 10000(样本)×10000(样本)大小的 1m 星载合成孔径雷达数据和 10000(样本)×26000(样本)大小的 0.3m POLSAR 数据来抑制侧叶。此外,我们还验证了我们的方法不会影响偏振特征。我们对抑制侧叶的效果进行了定性检验,结果令人满意。
{"title":"Quality improvement synthetic aperture radar (SAR) images using compressive sensing (CS) with Moore-Penrose inverse (MPI) and prior from spatial variant apodization (SVA).","authors":"Tao Xiong, Yachao Li, Mengdao Xing","doi":"10.1109/TPAMI.2024.3444910","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3444910","url":null,"abstract":"<p><p>When the locations of non-zero samples are known, the Moore-Penrose inverse (MPI) can be used for the data recovery of compressive sensing (CS). First, the prior from the locations is used to shrink the measurement matrix in CS. Then the data can be recovered by using MPI with such shrinking matrix. We can also prove that the results of data recovery from the original CS and our MPI-based method are the same mathematically. Based on such finding, a novel sidelobe-reduction method for synthetic aperture radar (SAR) and Polarimetric SAR (POLSAR) images is studied. The aim of sidelobe reduction is to recover the samples within the mainlobes and suppress the ones within the sidelobes. In our study, prior from spatial variant apodization (SVA) is used to determine the locations of the mainlobes and the sidelobes, respectively. With CS, the mainlobe area can be well recovered. Samples within the sidelobe areas are also recovered using background fusion. Our method is suitable for acquired data with large sizes. The performance of the proposed algorithm is evaluated with acquired spaceborne SAR and air-borne POLSAR data. In our experiments, we use the 1m space-borne SAR data with the size of 10000 (samples) × 10000 (samples) and 0.3m POLSAR data with the size of 10000 (samples) × 26000 (samples) for sidelobe suppression. Furthermore, We also verified that, our method does not affect the polarization signatures. The effectiveness for the sidelobe suppression is qualitatively examined, and results were satisfactory.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Safe Reinforcement Learning with Dual Robustness. 具有双重鲁棒性的安全强化学习
Pub Date : 2024-08-15 DOI: 10.1109/TPAMI.2024.3443916
Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li

Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or break down safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust under any adversaries remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. The optimality is only valid inside a feasible region (i.e., robust invariant set), while the identification of maximal feasible region must rely on how to learn the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including the problem formulation, iteration scheme, convergence analysis and practical algorithm design. The unification is built upon constrained two-player zero-sum Markov games, in which the objective for protagonist is twofold. For states inside the maximal robust invariant set, the goal is to pursue rewards under the condition of guaranteed safety; for states outside the maximal robust invariant set, the goal is to reduce the extent of constraint violation. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. We prove that the iteration scheme converges to the optimal task policy which maximizes the twofold objective in the worst cases, and the optimal safety policy which stays as far away from the safety boundary. The convergence of safety policy is established by exploiting the monotone contraction property of safety self-consistency operators, and that of task policy depends on the transformation of safety constraints into state-dependent action spaces. By adding two adversarial networks (one is for safety guarantee and the other is for task performance), we propose a practical deep RL algorithm for constrained zero-sum Markov games, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines by a large margin.

强化学习(RL)代理容易受到对抗性干扰的影响,这些干扰会降低任务性能或破坏安全规范。现有的方法要么是在没有对手的假设条件下满足安全要求(如安全 RL),要么是只关注针对性能对手的鲁棒性(如鲁棒 RL)。学习一种在任何对抗下都既安全又稳健的策略,仍然是一个具有挑战性的开放问题。困难在于如何在最坏情况下解决两个相互交织的方面:可行性和最优性。最优性只在可行区域(即鲁棒性不变集)内有效,而最大可行区域的识别必须依赖于如何学习最优策略。针对这一问题,我们提出了一个统一安全 RL 和鲁棒 RL 的系统框架,包括问题表述、迭代方案、收敛分析和实用算法设计。这种统一建立在受限的双人零和马尔可夫博弈基础上,其中主角的目标是双重的。对于最大稳健不变集内的状态,目标是在保证安全的条件下追求奖励;对于最大稳健不变集外的状态,目标是减少违反约束的程度。我们提出了一种双策略迭代方案,它能同时优化任务策略和安全策略。我们证明,迭代方案会收敛到最优任务策略,在最坏情况下使双重目标最大化,并收敛到最优安全策略,尽可能远离安全边界。安全策略的收敛性是利用安全自洽算子的单调收缩特性确定的,而任务策略的收敛性则取决于将安全约束条件转化为与状态相关的行动空间。通过添加两个对抗网络(一个是安全保证网络,另一个是任务性能网络),我们提出了一种针对受限零和马尔可夫博弈的实用深度 RL 算法,即双鲁棒性行动者批判(DRAC)。使用安全关键基准进行的评估表明,DRAC 在所有情况下(无对手、安全对手、性能对手)都能实现高性能和持久安全,性能远远优于所有基线算法。
{"title":"Safe Reinforcement Learning with Dual Robustness.","authors":"Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li","doi":"10.1109/TPAMI.2024.3443916","DOIUrl":"10.1109/TPAMI.2024.3443916","url":null,"abstract":"<p><p>Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or break down safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust under any adversaries remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. The optimality is only valid inside a feasible region (i.e., robust invariant set), while the identification of maximal feasible region must rely on how to learn the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including the problem formulation, iteration scheme, convergence analysis and practical algorithm design. The unification is built upon constrained two-player zero-sum Markov games, in which the objective for protagonist is twofold. For states inside the maximal robust invariant set, the goal is to pursue rewards under the condition of guaranteed safety; for states outside the maximal robust invariant set, the goal is to reduce the extent of constraint violation. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. We prove that the iteration scheme converges to the optimal task policy which maximizes the twofold objective in the worst cases, and the optimal safety policy which stays as far away from the safety boundary. The convergence of safety policy is established by exploiting the monotone contraction property of safety self-consistency operators, and that of task policy depends on the transformation of safety constraints into state-dependent action spaces. By adding two adversarial networks (one is for safety guarantee and the other is for task performance), we propose a practical deep RL algorithm for constrained zero-sum Markov games, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines by a large margin.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharpness-aware Lookahead for Accelerating Convergence and Improving Generalization. 锐度感知前瞻,加速收敛并提高泛化能力
Pub Date : 2024-08-15 DOI: 10.1109/TPAMI.2024.3444002
Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong

Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.

Lookahead 是一种流行的随机优化器,可以加速深度神经网络的训练过程。然而,Lookahead 找到的解决方案的泛化效果往往不如其基础优化器(如 SGD 和 Adam)。为了解决这个问题,我们提出了锐度感知 Lookahead(SALA),这是一种新颖的优化器,旨在识别泛化效果好的平最小值。SALA 将训练过程分为两个阶段。在第一阶段,通过对优化轨迹进行二次逼近来确定平坦区域的方向,而不会产生任何额外的计算开销。而在第二阶段,则通过锐度感知最小化(SAM)来确定,这对提高训练末期的泛化效果尤为有效。与 Lookahead 相比,SALA 既保留了加速收敛的优点,又比基础优化器具有更优越的泛化性能。对预期超额风险的理论分析,以及对典型神经网络架构和数据集的实证结果,都证明了 SALA 相对于 Lookahead 的优势。值得注意的是,与基础优化器相比,SALA 的计算开销大约增加了 25%,却能达到与 SAM 相同的泛化性能,而 SAM 需要的训练预算是基础优化器的两倍。
{"title":"Sharpness-aware Lookahead for Accelerating Convergence and Improving Generalization.","authors":"Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong","doi":"10.1109/TPAMI.2024.3444002","DOIUrl":"10.1109/TPAMI.2024.3444002","url":null,"abstract":"<p><p>Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Sound Source Localization via False Negative Elimination. 通过消除假阴性加强声源定位
Pub Date : 2024-08-15 DOI: 10.1109/TPAMI.2024.3444029
Zengjie Song, Jiangshe Zhang, Yuxi Wang, Junsong Fan, Zhaoxiang Zhang

Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can lead to the false negative issue, where the sounds semantically similar to visual instance are sampled as negatives and incorrectly pushed away from the visual anchor/query. As a result, this misalignment of audio and visual features could yield inferior performance. To address this issue, we propose a novel audio-visual learning framework which is instantiated with two individual learning schemes: self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL). SSPL explores image-audio positive pairs alone to discover semantically coherent similarities between audio and visual features, while a predictive coding module for feature alignment is introduced to facilitate the positive-only learning. In this regard SSPL acts as a negative-free method to eliminate false negatives. By contrast, SACL is designed to compact visual features and remove false negatives, providing reliable visual anchor and audio negatives for contrast. Different from SSPL, SACL releases the potential of audio-visual contrastive learning, offering an effective alternative to achieve the same goal. Comprehensive experiments demonstrate the superiority of our approach over the state-of-the-arts. Furthermore, we highlight the versatility of the learned representation by extending the approach to audio-visual event classification and object detection tasks. Code and models are available at: https://github.com/zjsong/SACL.

声源定位旨在定位视觉场景中发出声音的物体。最近的研究通常依靠对比学习来获得令人印象深刻的结果。然而,先前技术中随机抽样否定的常见做法可能会导致假否定问题,即与视觉实例语义相似的声音被抽样为否定,并被错误地推离视觉锚点/查询。因此,这种音频和视觉特征的错位可能会导致性能下降。为了解决这个问题,我们提出了一个新颖的视听学习框架,该框架包含两个单独的学习方案:自我监督预测学习(SSPL)和语义感知对比学习(SACL)。自监督预测学习(SSPL)只对图像和音频的正对进行探索,以发现音频和视觉特征之间在语义上一致的相似性,同时引入一个用于特征对齐的预测编码模块,以促进只对正对进行的学习。在这方面,SSPL 是一种消除假阴性的无阴性方法。相比之下,SACL 的设计目的是压缩视觉特征并消除假阴性,提供可靠的视觉锚点和音频阴性对比。与 SSPL 不同的是,SACL 释放了视听对比学习的潜能,为实现相同目标提供了一种有效的替代方法。综合实验证明了我们的方法优于最新技术。此外,我们还将该方法扩展到了视听事件分类和物体检测任务中,从而凸显了所学表示法的多功能性。代码和模型请访问:https://github.com/zjsong/SACL。
{"title":"Enhancing Sound Source Localization via False Negative Elimination.","authors":"Zengjie Song, Jiangshe Zhang, Yuxi Wang, Junsong Fan, Zhaoxiang Zhang","doi":"10.1109/TPAMI.2024.3444029","DOIUrl":"10.1109/TPAMI.2024.3444029","url":null,"abstract":"<p><p>Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can lead to the false negative issue, where the sounds semantically similar to visual instance are sampled as negatives and incorrectly pushed away from the visual anchor/query. As a result, this misalignment of audio and visual features could yield inferior performance. To address this issue, we propose a novel audio-visual learning framework which is instantiated with two individual learning schemes: self-supervised predictive learning (SSPL) and semantic-aware contrastive learning (SACL). SSPL explores image-audio positive pairs alone to discover semantically coherent similarities between audio and visual features, while a predictive coding module for feature alignment is introduced to facilitate the positive-only learning. In this regard SSPL acts as a negative-free method to eliminate false negatives. By contrast, SACL is designed to compact visual features and remove false negatives, providing reliable visual anchor and audio negatives for contrast. Different from SSPL, SACL releases the potential of audio-visual contrastive learning, offering an effective alternative to achieve the same goal. Comprehensive experiments demonstrate the superiority of our approach over the state-of-the-arts. Furthermore, we highlight the versatility of the learned representation by extending the approach to audio-visual event classification and object detection tasks. Code and models are available at: https://github.com/zjsong/SACL.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection. 面向多帧 3D 物体检测的时空图增强型 DETR。
Pub Date : 2024-08-14 DOI: 10.1109/TPAMI.2024.3443335
Yifan Zhang, Zhiyu Zhu, Junhui Hou, Dapeng Wu

The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection by addressing three key aspects specifically tailored for this task. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network, which represents queries as nodes in a graph and enables effective modeling of object interactions within a social context. To solve the problem of missing hard cases in the proposed output of the encoder in the current frame, we incorporate the output of the previous frame to initialize the query input of the decoder. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match. And similar queries are insufficiently suppressed and turn into redundant prediction boxes. To address this issue, our proposed IoU regularization term encourages similar queries to be distinct during the refinement. Through extensive experiments, we demonstrate the effectiveness of our approach in handling challenging scenarios, while incurring only a minor additional computational overhead. The code is publicly available at https://github.com/Eaphan/STEMD.

检测变换器(DETR)彻底改变了基于 CNN 的物体检测系统的设计,并展示出令人印象深刻的性能。然而,它在多帧三维物体检测领域的潜力在很大程度上仍未得到开发。在本文中,我们介绍了 STEMD,这是一种新颖的端到端框架,它通过解决专门为多帧三维物体检测任务定制的三个关键方面,增强了多帧三维物体检测的 DETR 类范式。首先,为了对物体间的空间交互和复杂的时间依赖性进行建模,我们引入了时空图注意力网络,该网络将查询表示为图中的节点,并能对社会环境中的物体交互进行有效建模。为了解决编码器在当前帧的拟议输出中缺失困难情况的问题,我们结合了前一帧的输出来初始化解码器的查询输入。最后,对网络来说,区分正面查询和其他非最佳匹配的高度相似查询是一个挑战。而且,相似查询没有得到充分抑制,会变成多余的预测框。为了解决这个问题,我们提出了 IoU 正则化术语,鼓励在细化过程中区分相似查询。通过大量实验,我们证明了我们的方法在处理具有挑战性的场景时的有效性,同时只产生了少量额外的计算开销。代码可在 https://github.com/Eaphan/STEMD 公开获取。
{"title":"Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection.","authors":"Yifan Zhang, Zhiyu Zhu, Junhui Hou, Dapeng Wu","doi":"10.1109/TPAMI.2024.3443335","DOIUrl":"10.1109/TPAMI.2024.3443335","url":null,"abstract":"<p><p>The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection by addressing three key aspects specifically tailored for this task. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network, which represents queries as nodes in a graph and enables effective modeling of object interactions within a social context. To solve the problem of missing hard cases in the proposed output of the encoder in the current frame, we incorporate the output of the previous frame to initialize the query input of the decoder. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match. And similar queries are insufficiently suppressed and turn into redundant prediction boxes. To address this issue, our proposed IoU regularization term encourages similar queries to be distinct during the refinement. Through extensive experiments, we demonstrate the effectiveness of our approach in handling challenging scenarios, while incurring only a minor additional computational overhead. The code is publicly available at https://github.com/Eaphan/STEMD.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection. 时间序列图神经网络调查:预测、分类、估算和异常检测。
Pub Date : 2024-08-14 DOI: 10.1109/TPAMI.2024.3443141
Ming Jin, Huan Yee Koh, Qingsong Wen, Daniele Zambon, Cesare Alippi, Geoffrey I Webb, Irwin King, Shirui Pan

Time series are the primary data type used to record dynamic system measurements and generated in great volume by both physical sensors and online processes (virtual sensors). Time series analytics is therefore crucial to unlocking the wealth of information implicit in available data. With the recent advancements in graph neural networks (GNNs), there has been a surge in GNN-based approaches for time series analysis. These approaches can explicitly model inter-temporal and inter-variable relationships, which traditional and other deep neural network-based methods struggle to do. In this survey, we provide a comprehensive review of graph neural networks for time series analysis (GNN4TS), encompassing four fundamental dimensions: forecasting, classification, anomaly detection, and imputation. Our aim is to guide designers and practitioners to understand, build applications, and advance research of GNN4TS. At first, we provide a comprehensive task-oriented taxonomy of GNN4TS. Then, we present and discuss representative research works and introduce mainstream applications of GNN4TS. A comprehensive discussion of potential future research directions completes the survey. This survey, for the first time, brings together a vast array of knowledge on GNN-based time series research, highlighting foundations, practical applications, and opportunities of graph neural networks for time series analysis.

时间序列是用于记录动态系统测量值的主要数据类型,由物理传感器和在线过程(虚拟传感器)产生,数量巨大。因此,时间序列分析对于挖掘可用数据中蕴含的丰富信息至关重要。随着图神经网络(GNN)的最新进展,基于 GNN 的时间序列分析方法也在不断涌现。这些方法可以明确地模拟时际和变量间的关系,而传统和其他基于深度神经网络的方法很难做到这一点。在本调查报告中,我们对用于时间序列分析的图神经网络(GNN4TS)进行了全面回顾,包括四个基本维度:预测、分类、异常检测和估算。我们的目的是引导设计者和实践者理解、构建应用并推进 GNN4TS 的研究。首先,我们提供了一个面向任务的 GNN4TS 综合分类法。然后,我们介绍并讨论了具有代表性的研究工作,并介绍了 GNN4TS 的主流应用。最后,我们对未来潜在的研究方向进行了全面讨论。本调查报告首次汇集了基于 GNN 的时间序列研究的大量知识,突出了图神经网络用于时间序列分析的基础、实际应用和机遇。
{"title":"A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection.","authors":"Ming Jin, Huan Yee Koh, Qingsong Wen, Daniele Zambon, Cesare Alippi, Geoffrey I Webb, Irwin King, Shirui Pan","doi":"10.1109/TPAMI.2024.3443141","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3443141","url":null,"abstract":"<p><p>Time series are the primary data type used to record dynamic system measurements and generated in great volume by both physical sensors and online processes (virtual sensors). Time series analytics is therefore crucial to unlocking the wealth of information implicit in available data. With the recent advancements in graph neural networks (GNNs), there has been a surge in GNN-based approaches for time series analysis. These approaches can explicitly model inter-temporal and inter-variable relationships, which traditional and other deep neural network-based methods struggle to do. In this survey, we provide a comprehensive review of graph neural networks for time series analysis (GNN4TS), encompassing four fundamental dimensions: forecasting, classification, anomaly detection, and imputation. Our aim is to guide designers and practitioners to understand, build applications, and advance research of GNN4TS. At first, we provide a comprehensive task-oriented taxonomy of GNN4TS. Then, we present and discuss representative research works and introduce mainstream applications of GNN4TS. A comprehensive discussion of potential future research directions completes the survey. This survey, for the first time, brings together a vast array of knowledge on GNN-based time series research, highlighting foundations, practical applications, and opportunities of graph neural networks for time series analysis.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity. 具有较低函数查询复杂性的非凸零序随机 ADMM 方法。
Pub Date : 2024-08-14 DOI: 10.1109/TPAMI.2023.3347082
Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving complex machine learning problems, where gradients of the objective functions are not available or computationally prohibitive. Recently, although many zeroth-order methods have been developed, these approaches still have two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a class of faster zeroth-order stochastic alternating direction method of multipliers (ADMM) methods (ZO-SPIDER-ADMM) to solve the nonconvex finite-sum problems with multiple nonsmooth penalties. Moreover, we prove that the ZO-SPIDER-ADMM methods can achieve a lower function query complexity of [Formula: see text] for finding an ϵ-stationary point, which improves the existing best nonconvex zeroth-order ADMM methods by a factor of [Formula: see text], where n and d denote the sample size and data dimension, respectively. At the same time, we propose a class of faster zeroth-order online ADMM methods (ZOO-ADMM+) to solve the nonconvex online problems with multiple nonsmooth penalties. We also prove that the proposed ZOO-ADMM+ methods achieve a lower function query complexity of [Formula: see text], which improves the existing best result by a factor of [Formula: see text]. Extensive experimental results on the structure adversarial attack on black-box deep neural networks demonstrate the efficiency of our new algorithms.

零阶(又称无导数)方法是一类有效的优化方法,可用于解决目标函数梯度不可用或计算量过大的复杂机器学习问题。最近,虽然开发了很多零阶方法,但这些方法仍有两个主要缺点:1)函数查询复杂度高;2)不太适合解决具有复杂惩罚和约束条件的问题。为了解决这些具有挑战性的缺点,我们在本文中提出了一类更快的零阶随机交替乘法(ADMM)方法(ZO-SPIDER-ADMM),用于解决具有多个非光滑惩罚的非凸求和问题。此外,我们还证明了 ZO-SPIDER-ADMM 方法在寻找ϵ静止点时可以实现更低的函数查询复杂度[式:见正文],比现有的最佳非凸零阶 ADMM 方法提高了[式:见正文],其中 n 和 d 分别表示样本大小和数据维度。同时,我们提出了一类更快的零阶在线 ADMM 方法(ZOO-ADMM+),用于解决具有多重非光滑惩罚的非凸在线问题。我们还证明,所提出的 ZOO-ADMM+ 方法实现了更低的函数查询复杂度[公式:见正文],将现有最佳结果提高了[公式:见正文]倍。针对黑盒深度神经网络的结构对抗攻击的大量实验结果证明了我们新算法的高效性。
{"title":"Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity.","authors":"Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang","doi":"10.1109/TPAMI.2023.3347082","DOIUrl":"https://doi.org/10.1109/TPAMI.2023.3347082","url":null,"abstract":"<p><p>Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving complex machine learning problems, where gradients of the objective functions are not available or computationally prohibitive. Recently, although many zeroth-order methods have been developed, these approaches still have two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a class of faster zeroth-order stochastic alternating direction method of multipliers (ADMM) methods (ZO-SPIDER-ADMM) to solve the nonconvex finite-sum problems with multiple nonsmooth penalties. Moreover, we prove that the ZO-SPIDER-ADMM methods can achieve a lower function query complexity of [Formula: see text] for finding an ϵ-stationary point, which improves the existing best nonconvex zeroth-order ADMM methods by a factor of [Formula: see text], where n and d denote the sample size and data dimension, respectively. At the same time, we propose a class of faster zeroth-order online ADMM methods (ZOO-ADMM+) to solve the nonconvex online problems with multiple nonsmooth penalties. We also prove that the proposed ZOO-ADMM+ methods achieve a lower function query complexity of [Formula: see text], which improves the existing best result by a factor of [Formula: see text]. Extensive experimental results on the structure adversarial attack on black-box deep neural networks demonstrate the efficiency of our new algorithms.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EasyDGL: Encode, Train and Interpret for Continuous-Time Dynamic Graph Learning. EasyDGL:连续时间动态图学习的编码、训练和解释。
Pub Date : 2024-08-14 DOI: 10.1109/TPAMI.2024.3443110
Chao Chen, Haoyu Geng, Nianzu Yang, Xiaokang Yang, Junchi Yan

Dynamic graphs arise in various real-world applications, and it is often welcomed to model the dynamics in continuous time domain for its flexibility. This paper aims to design an easy-to-use pipeline (EasyDGL which is also due to its implementation by DGL toolkit) composed of three modules with both strong fitting ability and interpretability, namely encoding, training and interpreting: i) a temporal point process (TPP) modulated attention architecture to endow the continuous-time resolution with the coupled spatiotemporal dynamics of the graph with edge-addition events; ii) a principled loss composed of task-agnostic TPP posterior maximization based on observed events, and a task-aware loss with a masking strategy over dynamic graph, where the tasks include dynamic link prediction, dynamic node classification and node traffic forecasting; iii) interpretation of the outputs (e.g., representations and predictions) with scalable perturbation-based quantitative analysis in the graph Fourier domain, which could comprehensively reflect the behavior of the learned model. Empirical results on public benchmarks show our superior performance for time-conditioned predictive tasks, and in particular EasyDGL can effectively quantify the predictive power of frequency content that a model learns from evolving graph data.

在现实世界的各种应用中都会出现动态图,在连续时域中建立动态模型因其灵活性而受到欢迎。本文旨在设计一种易于使用的管道(EasyDGL,这也是由于它是由 DGL 工具包实现的),它由三个既有强大拟合能力又有可解释性的模块组成,即编码、训练和解释:i) 时点过程(TPP)调制注意力架构,赋予连续时间分辨率与图的时空动态耦合边缘添加事件;ii) 原则性损耗,包括基于观测事件的任务无关 TPP 后验最大化,以及在动态图上采用掩码策略的任务感知损耗,其中任务包括动态链接预测、动态节点分类和节点流量预测;iii) 输出解释(例如,在图中对节点流量进行预测)。g.,iii) 在图傅立叶域中使用可扩展的基于扰动的定量分析来解释输出(如表示和预测),这可以全面反映所学模型的行为。公共基准的实证结果表明,我们在时间条件预测任务方面表现出色,尤其是 EasyDGL 可以有效量化模型从不断变化的图数据中学到的频率内容的预测能力。
{"title":"EasyDGL: Encode, Train and Interpret for Continuous-Time Dynamic Graph Learning.","authors":"Chao Chen, Haoyu Geng, Nianzu Yang, Xiaokang Yang, Junchi Yan","doi":"10.1109/TPAMI.2024.3443110","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3443110","url":null,"abstract":"<p><p>Dynamic graphs arise in various real-world applications, and it is often welcomed to model the dynamics in continuous time domain for its flexibility. This paper aims to design an easy-to-use pipeline (EasyDGL which is also due to its implementation by DGL toolkit) composed of three modules with both strong fitting ability and interpretability, namely encoding, training and interpreting: i) a temporal point process (TPP) modulated attention architecture to endow the continuous-time resolution with the coupled spatiotemporal dynamics of the graph with edge-addition events; ii) a principled loss composed of task-agnostic TPP posterior maximization based on observed events, and a task-aware loss with a masking strategy over dynamic graph, where the tasks include dynamic link prediction, dynamic node classification and node traffic forecasting; iii) interpretation of the outputs (e.g., representations and predictions) with scalable perturbation-based quantitative analysis in the graph Fourier domain, which could comprehensively reflect the behavior of the learned model. Empirical results on public benchmarks show our superior performance for time-conditioned predictive tasks, and in particular EasyDGL can effectively quantify the predictive power of frequency content that a model learns from evolving graph data.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on pattern analysis and machine intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1