Pub Date : 2025-12-05DOI: 10.1109/TPAMI.2025.3640709
Zhiyuan Zhai;Xiaojun Yuan;Xin Wang;Geoffrey Ye Li
Decentralized federated learning (DFL) is an emerging paradigm to enable edge devices collaboratively training a learning model using a device-to-device (D2D) communication manner without the coordination of a parameter server (PS). Aggregation weights, also known as mixing weights, are crucial in DFL process, and impact the learning efficiency and accuracy. Conventional design relies on a so-called central entity to collect all local information and conduct system optimization to obtain appropriate weights. In this paper, we develop a distributed aggregation weight optimization algorithm to align with the decentralized nature of DFL. We analyze convergence by quantitatively capturing the impact of the aggregation weights over decentralized communication networks. Based on the analysis, we then formulate a learning performance optimization problem by designing the aggregation weights to minimize the derived convergence bound. The optimization problem is further transformed as an eigenvalue optimization problem and solved by our proposed subgradient-based algorithm in a distributed fashion. In our algorithm, edge devices only need local information to obtain the optimal aggregation weights through local (D2D) communications, just like the learning itself. Therefore, the optimization, communication, and learning process can be all conducted in a distributed fashion, which leads to a genuinely distributed DFL system. Numerical results demonstrate the superiority of the proposed algorithm in practical DFL deployment.
{"title":"Decentralized Federated Learning With Distributed Aggregation Weight Optimization","authors":"Zhiyuan Zhai;Xiaojun Yuan;Xin Wang;Geoffrey Ye Li","doi":"10.1109/TPAMI.2025.3640709","DOIUrl":"10.1109/TPAMI.2025.3640709","url":null,"abstract":"Decentralized federated learning (DFL) is an emerging paradigm to enable edge devices collaboratively training a learning model using a device-to-device (D2D) communication manner without the coordination of a parameter server (PS). Aggregation weights, also known as mixing weights, are crucial in DFL process, and impact the learning efficiency and accuracy. Conventional design relies on a so-called central entity to collect all local information and conduct system optimization to obtain appropriate weights. In this paper, we develop a distributed aggregation weight optimization algorithm to align with the decentralized nature of DFL. We analyze convergence by quantitatively capturing the impact of the aggregation weights over decentralized communication networks. Based on the analysis, we then formulate a learning performance optimization problem by designing the aggregation weights to minimize the derived convergence bound. The optimization problem is further transformed as an eigenvalue optimization problem and solved by our proposed subgradient-based algorithm in a distributed fashion. In our algorithm, edge devices only need local information to obtain the optimal aggregation weights through local (D2D) communications, just like the learning itself. Therefore, the optimization, communication, and learning process can be all conducted in a distributed fashion, which leads to a genuinely distributed DFL system. Numerical results demonstrate the superiority of the proposed algorithm in practical DFL deployment.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3899-3910"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%.
{"title":"Harnessing Lightweight Transformer With Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation","authors":"Xinyu Liu;Zhen Chen;Wuyang Li;Chenxin Li;Yixuan Yuan","doi":"10.1109/TPAMI.2025.3640233","DOIUrl":"10.1109/TPAMI.2025.3640233","url":null,"abstract":"Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the <italic>extrinsic contextual information</i> to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes <italic>intrinsic contextual information</i> to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3852-3867"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1109/TPAMI.2025.3640697
Mengyuan Liu;Jinfu Liu;Yongkang Jiang;Bin He
Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face challenges such as information redundancy, susceptibility to noise and high storage costs. To address these issues and fully harness the useful information in videos, we propose a novel heatmap pooling network (HP-Net) for action recognition from videos, which extracts information-rich, robust and concise pooled features of the human body in videos through a feedback pooling module. The extracted pooled features demonstrate obvious performance advantages over the previously obtained pose data and heatmap features from videos. In addition, we design a spatial-motion co-learning module and a text refinement modulation module to integrate the extracted pooled features with other multimodal data, enabling more robust action recognition. Extensive experiments on several benchmarks namely NTU RGB+D 60, NTU RGB+D 120, Toyota-Smarthome and uncrewed aerial vehicles (UAV)-Human consistently verify the effectiveness of our HP-Net, which outperforms the existing human action recognition methods.
{"title":"Heatmap Pooling Network for Action Recognition From RGB Videos","authors":"Mengyuan Liu;Jinfu Liu;Yongkang Jiang;Bin He","doi":"10.1109/TPAMI.2025.3640697","DOIUrl":"10.1109/TPAMI.2025.3640697","url":null,"abstract":"Human action recognition (HAR) in videos has garnered widespread attention due to the rich information in RGB videos. Nevertheless, existing methods for extracting deep features from RGB videos face challenges such as information redundancy, susceptibility to noise and high storage costs. To address these issues and fully harness the useful information in videos, we propose a novel heatmap pooling network (HP-Net) for action recognition from videos, which extracts information-rich, robust and concise pooled features of the human body in videos through a feedback pooling module. The extracted pooled features demonstrate obvious performance advantages over the previously obtained pose data and heatmap features from videos. In addition, we design a spatial-motion co-learning module and a text refinement modulation module to integrate the extracted pooled features with other multimodal data, enabling more robust action recognition. Extensive experiments on several benchmarks namely NTU RGB+D 60, NTU RGB+D 120, Toyota-Smarthome and uncrewed aerial vehicles (UAV)-Human consistently verify the effectiveness of our HP-Net, which outperforms the existing human action recognition methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3726-3743"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1109/TPAMI.2025.3640589
Xiang Xu;Lingdong Kong;Hui Shuai;Wenwei Zhang;Liang Pan;Kai Chen;Ziwei Liu;Qingshan Liu
LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose SuperFlow++, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: (1) a view consistency alignment module to unify semantic information across camera views, (2) a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, (3) a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and (4) a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
{"title":"Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining","authors":"Xiang Xu;Lingdong Kong;Hui Shuai;Wenwei Zhang;Liang Pan;Kai Chen;Ziwei Liu;Qingshan Liu","doi":"10.1109/TPAMI.2025.3640589","DOIUrl":"10.1109/TPAMI.2025.3640589","url":null,"abstract":"LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose <bold>SuperFlow++</b>, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: <bold>(1)</b> a view consistency alignment module to unify semantic information across camera views, <bold>(2)</b> a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, <bold>(3)</b> a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and <bold>(4)</b> a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3819-3834"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Neural Networks (GNNs) have achieved remarkable success in machine learning tasks by learning the features of graph data. However, experiments show that vanilla GNNs fail to achieve good classification performance in the field of graph anomaly detection. To address this issue, we propose and theoretically prove that the high-Class Homophily Variance (CHV) characteristic is the reason behind the suboptimal performance of GNN models in anomaly detection tasks. Statistical analysis shows that in most standard node classification datasets, homophily levels are similar across all classes, so CHV is low. In contrast, graph anomaly detection datasets have high CHV, as benign nodes are highly homophilic while anomalies are not, leading to a clear separation. To mitigate its impact, we propose a novel GNN model named Homophily Edge Augment Graph Neural Network (HEAug). Different from previous work, our method emphasizes generating new edges with low CHV value, using the original edges as an auxiliary. HEAug samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HEAug achieved the best performance across eight benchmark datasets, including anomaly detection, edgeless node classification and adversarial attack. We also defined a heterophily attack to increase the CHV value in other graphs, demonstrating the effectiveness of our theory and model in various scenarios.
{"title":"Homophily Edge Augment Graph Neural Network for High-Class Homophily Variance Learning","authors":"Mingjian Guang;Rui Zhang;Dawei Cheng;Xiaoyang Wang;Xin Liu;Jie Yang;Yi Ouyang;Xian Wu;Yefeng Zheng","doi":"10.1109/TPAMI.2025.3640635","DOIUrl":"10.1109/TPAMI.2025.3640635","url":null,"abstract":"Graph Neural Networks (GNNs) have achieved remarkable success in machine learning tasks by learning the features of graph data. However, experiments show that vanilla GNNs fail to achieve good classification performance in the field of graph anomaly detection. To address this issue, we propose and theoretically prove that the high-Class Homophily Variance (CHV) characteristic is the reason behind the suboptimal performance of GNN models in anomaly detection tasks. Statistical analysis shows that in most standard node classification datasets, homophily levels are similar across all classes, so CHV is low. In contrast, graph anomaly detection datasets have high CHV, as benign nodes are highly homophilic while anomalies are not, leading to a clear separation. To mitigate its impact, we propose a novel GNN model named Homophily Edge Augment Graph Neural Network (<monospace>HEAug</monospace>). Different from previous work, our method emphasizes generating new edges with low CHV value, using the original edges as an auxiliary. <monospace>HEAug</monospace> samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that <monospace>HEAug</monospace> achieved the best performance across eight benchmark datasets, including anomaly detection, edgeless node classification and adversarial attack. We also defined a heterophily attack to increase the CHV value in other graphs, demonstrating the effectiveness of our theory and model in various scenarios.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3835-3851"},"PeriodicalIF":18.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145680420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1109/TPAMI.2025.3640109
Xinwang Liu;Ke Liang;Jun Wang;Suyuan Liu;Xiangke Wang;Huaimin Wang
Multi-view clustering (MVC), as an important machine learning task, aims to group data into distinct groups by leveraging complementary and consistent information across multiple views. During the last two decades, it has been widely studied, and many methods have been proposed, which has brought incredible development to this field. However, few works comprehensively summarize existing methods and point out the potential challenges in this field for the next decades. To this end, our survey thoroughly reviews existing MVC methods according to three taxonomies, i.e., techniques, fusion strategies, and scenarios. Specifically, seven typical techniques, four fusion strategies, and five typical scenarios are included. Besides, we also collect the commonly used datasets and analyze the performance of typical MVC methods. Moreover, we summarize six application scenarios of existing MVC methods ranging from computer vision, and information retrieval tasks to medical diagnosis and bio-informatics. In particular, we point out seven interesting future directions in this field, which will definitely enlighten the readers.
{"title":"Two Decades of Multi-View Clustering: Taxonomy, Application, and Challenge","authors":"Xinwang Liu;Ke Liang;Jun Wang;Suyuan Liu;Xiangke Wang;Huaimin Wang","doi":"10.1109/TPAMI.2025.3640109","DOIUrl":"10.1109/TPAMI.2025.3640109","url":null,"abstract":"Multi-view clustering (MVC), as an important machine learning task, aims to group data into distinct groups by leveraging complementary and consistent information across multiple views. During the last two decades, it has been widely studied, and many methods have been proposed, which has brought incredible development to this field. However, few works comprehensively summarize existing methods and point out the potential challenges in this field for the next decades. To this end, our survey thoroughly reviews existing MVC methods according to three taxonomies, i.e., techniques, fusion strategies, and scenarios. Specifically, seven typical techniques, four fusion strategies, and five typical scenarios are included. Besides, we also collect the commonly used datasets and analyze the performance of typical MVC methods. Moreover, we summarize six application scenarios of existing MVC methods ranging from computer vision, and information retrieval tasks to medical diagnosis and bio-informatics. In particular, we point out seven interesting future directions in this field, which will definitely enlighten the readers.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3744-3764"},"PeriodicalIF":18.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1109/TPAMI.2025.3640429
Kaiyan Zhang;Xinghui Li;Jingyi Lu;Kai Han
Establishing semantic correspondence is a challenging task in computer vision, aiming to match keypoints with the same semantic information across different images. Benefiting from the rapid development of deep learning, remarkable progress has been made over the past decade. However, a comprehensive review and analysis of this task remains absent. In this paper, we present the first extensive survey of semantic correspondence methods. We first propose a taxonomy to classify existing methods based on the type of their method designs. These methods are then categorized accordingly, and we provide a detailed analysis of each approach. Furthermore, we aggregate and summarize the results of methods in the literature across various benchmarks into a unified comparative table, with detailed configurations to highlight performance variations. Additionally, to provide a detailed understanding of existing methods for semantic matching, we thoroughly conduct controlled experiments to analyze the effectiveness of the components of different methods. Finally, we propose a simple yet effective baseline that achieves state-of-the-art performance on multiple benchmarks, providing a solid foundation for future research in this field. We hope this survey serves as a comprehensive reference and consolidated baseline for future development.
{"title":"Semantic Correspondence: Unified Benchmarking and a Strong Baseline","authors":"Kaiyan Zhang;Xinghui Li;Jingyi Lu;Kai Han","doi":"10.1109/TPAMI.2025.3640429","DOIUrl":"10.1109/TPAMI.2025.3640429","url":null,"abstract":"Establishing semantic correspondence is a challenging task in computer vision, aiming to match keypoints with the same semantic information across different images. Benefiting from the rapid development of deep learning, remarkable progress has been made over the past decade. However, a comprehensive review and analysis of this task remains absent. In this paper, we present the first extensive survey of semantic correspondence methods. We first propose a taxonomy to classify existing methods based on the type of their method designs. These methods are then categorized accordingly, and we provide a detailed analysis of each approach. Furthermore, we aggregate and summarize the results of methods in the literature across various benchmarks into a unified comparative table, with detailed configurations to highlight performance variations. Additionally, to provide a detailed understanding of existing methods for semantic matching, we thoroughly conduct controlled experiments to analyze the effectiveness of the components of different methods. Finally, we propose a simple yet effective baseline that achieves state-of-the-art performance on multiple benchmarks, providing a solid foundation for future research in this field. We hope this survey serves as a comprehensive reference and consolidated baseline for future development.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3911-3930"},"PeriodicalIF":18.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1109/TPAMI.2025.3640172
Viet Quan Le;Viet Cuong Ta
Learning over dynamic graphs poses major challenges, including capturing the evolving relationship in the graphs. Inspired by the advantages of hyperbolic embedding in static graphs, the hyperbolic space is expected to capture complex interactions in dynamic graphs. However, due to the distortion errors in the standard tangent space mappings, hyperbolic methods become more sensitive to noise and reduce the learning capacity. To address the distortion in tangent space, we proposed HMPTGN, a temporal graph network that operates directly on the hyperbolic manifold. In this journal paper, we introduce the HMPTGN+ architecture, an extension of the original HMPTGN with major updates to learn better representations of dynamic graphs based on the hyperbolic embedding. Our framework incorporates a high-order graph neural network for extracting spatial dependencies, a dilated causal attention mechanism for modeling temporal patterns while preserving causality, and a curvature-awareness mechanism to capture dynamic structures. Extensive experiments demonstrate the effectiveness of our proposed HMPTGN+ framework over state-of-the-art baselines in both temporal link prediction and temporal new link prediction tasks.
{"title":"Toward an Advanced Temporal Graph Network in Hyperbolic Space","authors":"Viet Quan Le;Viet Cuong Ta","doi":"10.1109/TPAMI.2025.3640172","DOIUrl":"10.1109/TPAMI.2025.3640172","url":null,"abstract":"Learning over dynamic graphs poses major challenges, including capturing the evolving relationship in the graphs. Inspired by the advantages of hyperbolic embedding in static graphs, the hyperbolic space is expected to capture complex interactions in dynamic graphs. However, due to the distortion errors in the standard tangent space mappings, hyperbolic methods become more sensitive to noise and reduce the learning capacity. To address the distortion in tangent space, we proposed HMPTGN, a temporal graph network that operates directly on the hyperbolic manifold. In this journal paper, we introduce the HMPTGN+ architecture, an extension of the original HMPTGN with major updates to learn better representations of dynamic graphs based on the hyperbolic embedding. Our framework incorporates a high-order graph neural network for extracting spatial dependencies, a dilated causal attention mechanism for modeling temporal patterns while preserving causality, and a curvature-awareness mechanism to capture dynamic structures. Extensive experiments demonstrate the effectiveness of our proposed HMPTGN+ framework over state-of-the-art baselines in both <italic>temporal link prediction</i> and <italic>temporal new link prediction</i> tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3868-3884"},"PeriodicalIF":18.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1109/TPAMI.2025.3639647
Chuyu Wang;Huiting Deng;Dong Liu
Electrical Impedance Tomography (EIT) provides a non-invasive, portable imaging modality with significant potential in medical and industrial applications. Despite its advantages, EIT encounters two primary challenges: the ill-posed nature of its inverse problem and the spatially variable, location-dependent sensitivity distribution. Traditional model-based methods mitigate ill-posedness through regularization but overlook sensitivity variability, while supervised deep learning approaches require extensive training data and lack generalization. Recent developments in neural fields have introduced implicit regularization techniques for image reconstruction; however, these methods often overlook the physical principles underlying EIT, thereby limiting their effectiveness. In this study, we propose PhyNC (Physics-driven Neural Compensation), an unsupervised deep learning framework that incorporates the physical principles of EIT. PhyNC addresses both the ill-posed inverse problem and the sensitivity distribution by dynamically allocating neural representational capacity to regions with lower sensitivity, ensuring accurate and balanced conductivity reconstructions. Extensive evaluations on both simulated and experimental data demonstrate that PhyNC outperforms existing methods in terms of detail preservation and artifact resistance, particularly in low-sensitivity regions. Our approach enhances the robustness of EIT reconstructions and provides a flexible framework that can be adapted to other imaging modalities with similar challenges.
{"title":"Physics-Driven Neural Compensation for Electrical Impedance Tomography","authors":"Chuyu Wang;Huiting Deng;Dong Liu","doi":"10.1109/TPAMI.2025.3639647","DOIUrl":"10.1109/TPAMI.2025.3639647","url":null,"abstract":"Electrical Impedance Tomography (EIT) provides a non-invasive, portable imaging modality with significant potential in medical and industrial applications. Despite its advantages, EIT encounters two primary challenges: the ill-posed nature of its inverse problem and the spatially variable, location-dependent sensitivity distribution. Traditional model-based methods mitigate ill-posedness through regularization but overlook sensitivity variability, while supervised deep learning approaches require extensive training data and lack generalization. Recent developments in neural fields have introduced implicit regularization techniques for image reconstruction; however, these methods often overlook the physical principles underlying EIT, thereby limiting their effectiveness. In this study, we propose PhyNC (Physics-driven Neural Compensation), an unsupervised deep learning framework that incorporates the physical principles of EIT. PhyNC addresses both the ill-posed inverse problem and the sensitivity distribution by dynamically allocating neural representational capacity to regions with lower sensitivity, ensuring accurate and balanced conductivity reconstructions. Extensive evaluations on both simulated and experimental data demonstrate that PhyNC outperforms existing methods in terms of detail preservation and artifact resistance, particularly in low-sensitivity regions. Our approach enhances the robustness of EIT reconstructions and provides a flexible framework that can be adapted to other imaging modalities with similar challenges.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3783-3800"},"PeriodicalIF":18.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145664583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1109/TPAMI.2025.3639593
Linshan Wu;Jiaxin Zhuang;Hao Chen
The scarcity of annotations poses a significant challenge in medical image analysis, which demands extensive efforts from radiologists, especially for high-dimension 3D medical images. Large-scale pre-training has emerged as a promising label-efficient solution, owing to the utilization of large-scale data, large models, and advanced pre-training techniques. However, its development in medical images remains underexplored. The primary challenge lies in harnessing large-scale unlabeled data and learning high-level semantics without annotations. We observe that 3D medical images exhibit consistent geometric context, i.e., consistent geometric relations between different organs, which leads to a promising way for learning consistent representations. Motivated by this, we introduce a simple-yet-effective Volume Contrast (VoCo) framework to leverage geometric context priors for self-supervision. Given an input volume, we extract base crops from different regions to construct positive and negative pairs for contrastive learning. Then we predict the contextual position of a random crop by contrasting its similarity to the base crops. In this way, VoCo implicitly encodes the inherent geometric context into model representations, facilitating high-level semantic learning without annotations. To assess effectiveness, we (1) introduce PreCT-160 K, the largest medical image pre-training dataset to date, which comprises 160 K Computed Tomography (CT) volumes covering diverse anatomic structures; (2) investigate scaling laws and propose guidelines for tailoring different model sizes to various medical tasks; (3) build a comprehensive benchmark encompassing 51 medical tasks, including segmentation, classification, registration, and vision-language. Extensive experiments highlight the superiority of VoCo, showcasing promising transferability to unseen modalities and datasets. VoCo notably enhances performance on datasets with limited labeled cases and significantly expedites fine-tuning convergence.
{"title":"Large-Scale 3D Medical Image Pre-Training With Geometric Context Priors","authors":"Linshan Wu;Jiaxin Zhuang;Hao Chen","doi":"10.1109/TPAMI.2025.3639593","DOIUrl":"10.1109/TPAMI.2025.3639593","url":null,"abstract":"The scarcity of annotations poses a significant challenge in medical image analysis, which demands extensive efforts from radiologists, especially for high-dimension 3D medical images. Large-scale pre-training has emerged as a promising label-efficient solution, owing to the utilization of large-scale data, large models, and advanced pre-training techniques. However, its development in medical images remains underexplored. The primary challenge lies in harnessing large-scale unlabeled data and learning high-level semantics without annotations. We observe that 3D medical images exhibit consistent geometric context, i.e., consistent geometric relations between different organs, which leads to a promising way for learning consistent representations. Motivated by this, we introduce a simple-yet-effective <bold>Vo</b>lume <bold>Co</b>ntrast (<bold>VoCo</b>) framework to leverage geometric context priors for self-supervision. Given an input volume, we extract base crops from different regions to construct positive and negative pairs for contrastive learning. Then we predict the contextual position of a random crop by contrasting its similarity to the base crops. In this way, VoCo implicitly encodes the inherent geometric context into model representations, facilitating high-level semantic learning without annotations. To assess effectiveness, we <bold>(1)</b> introduce PreCT-160 K, the largest medical image pre-training dataset to date, which comprises 160 K Computed Tomography (CT) volumes covering diverse anatomic structures; <bold>(2)</b> investigate scaling laws and propose guidelines for tailoring different model sizes to various medical tasks; <bold>(3)</b> build a comprehensive benchmark encompassing 51 medical tasks, including segmentation, classification, registration, and vision-language. Extensive experiments highlight the superiority of VoCo, showcasing promising transferability to unseen modalities and datasets. VoCo notably enhances performance on datasets with limited labeled cases and significantly expedites fine-tuning convergence.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"48 3","pages":"3801-3818"},"PeriodicalIF":18.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145664266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}